**Предсказание температуры стали.**
Описание проекта.
Для оптимизации потребления электроэнергии на металлургическом комбинате необходимо построить модель, которая предскажет температуру стали на финальном этапе обработки.
Описание данных.
Данные состоят из файлов, полученных из разных источников:
data_arc_new.csv — данные об электродах;data_bulk_new.csv — данные о подаче сыпучих материалов (объём);data_bulk_time_new.csv — данные о подаче сыпучих материалов (время);data_gas_new.csv — данные о продувке сплава газом;data_temp_new.csv — результаты измерения температуры;data_wire_new.csv — данные о проволочных материалах (объём);data_wire_time_new.csv — данные о проволочных материалах (время).Во всех файлах столбец key содержит номер партии. В файлах может быть несколько строк с одинаковым значением key: они соответствуют разным итерациям обработки.
df_arc — данные об электродах.df_bulk — данные о подаче сыпучих материалов (объём)df_bulk_time — данные о подаче сыпучих материалов (время).df_gas — данные о продувке сплава газомdf_temp — результаты измерения температуры.df_wire — данные о проволочных материалах (объём).df_wire_time — данные о проволочных материалах (время).# Импорт необходимых библиотек
import os
import optuna
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
from time import time
from sklearn.svm import SVR
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
from sklearn.dummy import DummyRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error as mae
from sklearn.model_selection import KFold, cross_val_score
# Обозначим константы
RANDOM_STATE = 140823
# Список локальных файлов
local_files = [
'data_arc_new.csv',
'data_bulk_new.csv',
'data_bulk_time_new.csv',
'data_gas_new.csv',
'data_temp_new.csv',
'data_wire_new.csv',
'data_wire_time_new.csv'
]
# Список онлайн файлов
online_files = [
'https://***/data_arc_new.csv',
'https://***/data_bulk_new.csv',
'https://***/data_bulk_time_new.csv',
'https://***/data_gas_new.csv',
'https://***/data_temp_new.csv',
'https://***/data_wire_new.csv',
'https://***/data_wire_time_new.csv'
]
# Создание пустого словаря для хранения наборов данных
datasets = {}
# Цикл по спискам локальных и онлайн файлов
for i in range(len(local_files)):
local_file = local_files[i]
online_file = online_files[i]
# Извлечение имени набора данных из имени файла
dataset_name = local_file.split('.')[0]
# Проверка наличия локального файла
if os.path.exists(local_file):
# Если локальный файл существует, считываем его и сохраняем в словаре datasets
datasets[dataset_name] = pd.read_csv(local_file)
else:
# Если локальный файл не существует, считываем онлайн файл по URL и сохраняем в словаре datasets
datasets[dataset_name] = pd.read_csv(online_file)
# Отображение набора данных и информации о нем
print(f"{dataset_name}:")
display(datasets[dataset_name])
print(datasets[dataset_name].info())
print('*'*100)
# Присвоение отдельных наборов данных переменным для удобства
df_arc = datasets['data_arc_new']
df_bulk = datasets['data_bulk_new']
df_bulk_time = datasets['data_bulk_time_new']
df_gas = datasets['data_gas_new']
df_temp = datasets['data_temp_new']
df_wire = datasets['data_wire_new']
df_wire_time = datasets['data_wire_time_new']
data_arc_new:
| key | Начало нагрева дугой | Конец нагрева дугой | Активная мощность | Реактивная мощность | |
|---|---|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:14 | 2019-05-03 11:06:02 | 0.305130 | 0.211253 |
| 1 | 1 | 2019-05-03 11:07:28 | 2019-05-03 11:10:33 | 0.765658 | 0.477438 |
| 2 | 1 | 2019-05-03 11:11:44 | 2019-05-03 11:14:36 | 0.580313 | 0.430460 |
| 3 | 1 | 2019-05-03 11:18:14 | 2019-05-03 11:24:19 | 0.518496 | 0.379979 |
| 4 | 1 | 2019-05-03 11:26:09 | 2019-05-03 11:28:37 | 0.867133 | 0.643691 |
| ... | ... | ... | ... | ... | ... |
| 14871 | 3241 | 2019-09-06 16:49:05 | 2019-09-06 16:51:42 | 0.439735 | 0.299579 |
| 14872 | 3241 | 2019-09-06 16:55:11 | 2019-09-06 16:58:11 | 0.646498 | 0.458240 |
| 14873 | 3241 | 2019-09-06 17:06:48 | 2019-09-06 17:09:52 | 1.039726 | 0.769302 |
| 14874 | 3241 | 2019-09-06 17:21:58 | 2019-09-06 17:22:55 | 0.530267 | 0.361543 |
| 14875 | 3241 | 2019-09-06 17:24:54 | 2019-09-06 17:26:15 | 0.389057 | 0.251347 |
14876 rows × 5 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14876 entries, 0 to 14875 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 14876 non-null int64 1 Начало нагрева дугой 14876 non-null object 2 Конец нагрева дугой 14876 non-null object 3 Активная мощность 14876 non-null float64 4 Реактивная мощность 14876 non-null float64 dtypes: float64(2), int64(1), object(2) memory usage: 581.2+ KB None **************************************************************************************************** data_bulk_new:
| key | Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | NaN | NaN | NaN | 43.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 206.0 | NaN | 150.0 | 154.0 |
| 1 | 2 | NaN | NaN | NaN | 73.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 206.0 | NaN | 149.0 | 154.0 |
| 2 | 3 | NaN | NaN | NaN | 34.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 205.0 | NaN | 152.0 | 153.0 |
| 3 | 4 | NaN | NaN | NaN | 81.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 207.0 | NaN | 153.0 | 154.0 |
| 4 | 5 | NaN | NaN | NaN | 78.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 203.0 | NaN | 151.0 | 152.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3124 | 3237 | NaN | NaN | 170.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 252.0 | NaN | 130.0 | 206.0 |
| 3125 | 3238 | NaN | NaN | 126.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 254.0 | NaN | 108.0 | 106.0 |
| 3126 | 3239 | NaN | NaN | NaN | NaN | NaN | 114.0 | NaN | NaN | NaN | NaN | NaN | 158.0 | NaN | 270.0 | 88.0 |
| 3127 | 3240 | NaN | NaN | NaN | NaN | NaN | 26.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 192.0 | 54.0 |
| 3128 | 3241 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 180.0 | 52.0 |
3129 rows × 16 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3129 entries, 0 to 3128 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3129 non-null int64 1 Bulk 1 252 non-null float64 2 Bulk 2 22 non-null float64 3 Bulk 3 1298 non-null float64 4 Bulk 4 1014 non-null float64 5 Bulk 5 77 non-null float64 6 Bulk 6 576 non-null float64 7 Bulk 7 25 non-null float64 8 Bulk 8 1 non-null float64 9 Bulk 9 19 non-null float64 10 Bulk 10 176 non-null float64 11 Bulk 11 177 non-null float64 12 Bulk 12 2450 non-null float64 13 Bulk 13 18 non-null float64 14 Bulk 14 2806 non-null float64 15 Bulk 15 2248 non-null float64 dtypes: float64(15), int64(1) memory usage: 391.2 KB None **************************************************************************************************** data_bulk_time_new:
| key | Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | NaN | NaN | NaN | 2019-05-03 11:28:48 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-05-03 11:24:31 | NaN | 2019-05-03 11:14:50 | 2019-05-03 11:10:43 |
| 1 | 2 | NaN | NaN | NaN | 2019-05-03 11:36:50 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-05-03 11:53:30 | NaN | 2019-05-03 11:48:37 | 2019-05-03 11:44:39 |
| 2 | 3 | NaN | NaN | NaN | 2019-05-03 12:32:39 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-05-03 12:27:13 | NaN | 2019-05-03 12:21:01 | 2019-05-03 12:16:16 |
| 3 | 4 | NaN | NaN | NaN | 2019-05-03 12:43:22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-05-03 12:58:00 | NaN | 2019-05-03 12:51:11 | 2019-05-03 12:46:36 |
| 4 | 5 | NaN | NaN | NaN | 2019-05-03 13:30:47 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-05-03 13:30:47 | NaN | 2019-05-03 13:34:12 | 2019-05-03 13:30:47 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3124 | 3237 | NaN | NaN | 2019-09-06 11:54:15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-09-06 11:49:45 | NaN | 2019-09-06 11:45:22 | 2019-09-06 11:40:06 |
| 3125 | 3238 | NaN | NaN | 2019-09-06 12:26:52 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-09-06 12:18:35 | NaN | 2019-09-06 12:31:49 | 2019-09-06 12:26:52 |
| 3126 | 3239 | NaN | NaN | NaN | NaN | NaN | 2019-09-06 15:06:00 | NaN | NaN | NaN | NaN | NaN | 2019-09-06 15:01:44 | NaN | 2019-09-06 14:58:15 | 2019-09-06 14:48:06 |
| 3127 | 3240 | NaN | NaN | NaN | NaN | NaN | 2019-09-06 16:24:28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-09-06 16:07:29 | 2019-09-06 16:01:34 |
| 3128 | 3241 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-09-06 17:26:33 | 2019-09-06 17:23:15 |
3129 rows × 16 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3129 entries, 0 to 3128 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3129 non-null int64 1 Bulk 1 252 non-null object 2 Bulk 2 22 non-null object 3 Bulk 3 1298 non-null object 4 Bulk 4 1014 non-null object 5 Bulk 5 77 non-null object 6 Bulk 6 576 non-null object 7 Bulk 7 25 non-null object 8 Bulk 8 1 non-null object 9 Bulk 9 19 non-null object 10 Bulk 10 176 non-null object 11 Bulk 11 177 non-null object 12 Bulk 12 2450 non-null object 13 Bulk 13 18 non-null object 14 Bulk 14 2806 non-null object 15 Bulk 15 2248 non-null object dtypes: int64(1), object(15) memory usage: 391.2+ KB None **************************************************************************************************** data_gas_new:
| key | Газ 1 | |
|---|---|---|
| 0 | 1 | 29.749986 |
| 1 | 2 | 12.555561 |
| 2 | 3 | 28.554793 |
| 3 | 4 | 18.841219 |
| 4 | 5 | 5.413692 |
| ... | ... | ... |
| 3234 | 3237 | 5.543905 |
| 3235 | 3238 | 6.745669 |
| 3236 | 3239 | 16.023518 |
| 3237 | 3240 | 11.863103 |
| 3238 | 3241 | 12.680959 |
3239 rows × 2 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3239 entries, 0 to 3238 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3239 non-null int64 1 Газ 1 3239 non-null float64 dtypes: float64(1), int64(1) memory usage: 50.7 KB None **************************************************************************************************** data_temp_new:
| key | Время замера | Температура | |
|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:04 | 1571.0 |
| 1 | 1 | 2019-05-03 11:07:18 | 1604.0 |
| 2 | 1 | 2019-05-03 11:11:34 | 1618.0 |
| 3 | 1 | 2019-05-03 11:18:04 | 1601.0 |
| 4 | 1 | 2019-05-03 11:25:59 | 1606.0 |
| ... | ... | ... | ... |
| 18087 | 3241 | 2019-09-06 16:55:01 | NaN |
| 18088 | 3241 | 2019-09-06 17:06:38 | NaN |
| 18089 | 3241 | 2019-09-06 17:21:48 | NaN |
| 18090 | 3241 | 2019-09-06 17:24:44 | NaN |
| 18091 | 3241 | 2019-09-06 17:30:05 | NaN |
18092 rows × 3 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18092 entries, 0 to 18091 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 18092 non-null int64 1 Время замера 18092 non-null object 2 Температура 14665 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 424.2+ KB None **************************************************************************************************** data_wire_new:
| key | Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60.059998 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2 | 96.052315 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 3 | 91.160157 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 4 | 89.063515 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 5 | 89.238236 | 9.11456 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3076 | 3237 | 38.088959 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3077 | 3238 | 56.128799 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3078 | 3239 | 143.357761 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3079 | 3240 | 34.070400 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3080 | 3241 | 63.117595 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3081 rows × 10 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3081 entries, 0 to 3080 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3081 non-null int64 1 Wire 1 3055 non-null float64 2 Wire 2 1079 non-null float64 3 Wire 3 63 non-null float64 4 Wire 4 14 non-null float64 5 Wire 5 1 non-null float64 6 Wire 6 73 non-null float64 7 Wire 7 11 non-null float64 8 Wire 8 19 non-null float64 9 Wire 9 29 non-null float64 dtypes: float64(9), int64(1) memory usage: 240.8 KB None **************************************************************************************************** data_wire_time_new:
| key | Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2019-05-03 11:06:19 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2 | 2019-05-03 11:36:50 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 3 | 2019-05-03 12:11:46 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 4 | 2019-05-03 12:43:22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 5 | 2019-05-03 13:20:44 | 2019-05-03 13:15:34 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3076 | 3237 | 2019-09-06 11:33:38 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3077 | 3238 | 2019-09-06 12:18:35 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3078 | 3239 | 2019-09-06 14:36:11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3079 | 3240 | 2019-09-06 15:33:55 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3080 | 3241 | 2019-09-06 17:10:06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3081 rows × 10 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3081 entries, 0 to 3080 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3081 non-null int64 1 Wire 1 3055 non-null object 2 Wire 2 1079 non-null object 3 Wire 3 63 non-null object 4 Wire 4 14 non-null object 5 Wire 5 1 non-null object 6 Wire 6 73 non-null object 7 Wire 7 11 non-null object 8 Wire 8 19 non-null object 9 Wire 9 29 non-null object dtypes: int64(1), object(9) memory usage: 240.8+ KB None ****************************************************************************************************
df_arc
Данные содержат информацию о начале и конце нагрева дугой, а также активной и реактивной мощности для каждого ключевого значения. Всего в данных содержится 14876 записей.
Типы данных в столбцах :
Пропущенные значения отсутствуют.
# Переведём время из формата строки в формат времени
df_arc['Начало нагрева дугой'] = pd.to_datetime(df_arc['Начало нагрева дугой'])
df_arc['Конец нагрева дугой'] = pd.to_datetime(df_arc['Конец нагрева дугой'])
df_arc.info()
df_arc.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14876 entries, 0 to 14875 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 14876 non-null int64 1 Начало нагрева дугой 14876 non-null datetime64[ns] 2 Конец нагрева дугой 14876 non-null datetime64[ns] 3 Активная мощность 14876 non-null float64 4 Реактивная мощность 14876 non-null float64 dtypes: datetime64[ns](2), float64(2), int64(1) memory usage: 581.2 KB
| key | Начало нагрева дугой | Конец нагрева дугой | Активная мощность | Реактивная мощность | |
|---|---|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:14 | 2019-05-03 11:06:02 | 0.305130 | 0.211253 |
| 1 | 1 | 2019-05-03 11:07:28 | 2019-05-03 11:10:33 | 0.765658 | 0.477438 |
| 2 | 1 | 2019-05-03 11:11:44 | 2019-05-03 11:14:36 | 0.580313 | 0.430460 |
| 3 | 1 | 2019-05-03 11:18:14 | 2019-05-03 11:24:19 | 0.518496 | 0.379979 |
| 4 | 1 | 2019-05-03 11:26:09 | 2019-05-03 11:28:37 | 0.867133 | 0.643691 |
df_bulk
Данные содержат информацию о добавлении сыпучих материалов в процессе обработки. Всего в данных содержится 3129 записей. В столбцах Bulk 1...Bulk 15 указаны объемы добавленных материалов для каждого ключевого значения.
Отметим, что в данных имеются пропущенные значения для большинства столбцов, что означает, что не все материалы были добавлены для каждого ключевого значения.
df_bulk_time
В таблице представлены данные о времени подачи сыпучих материалов. Всего в таблице 3129 строк и 16 столбцов, как и в df_bulk. В столбце key указан ключ для каждой записи. В столбцах Bulk 1...Bulk 15 указаны временные метки для выполнения операций Bulk. Некоторые столбцы содержат пропущенные значения. Для key выполнены несколько операций Bulk, для разныхkey разные и в рвзных количествах. (Bulk 14 выполнялся 2806 раз, а Bulk 8 - 1)
Данные в столбцах Bulk 1...Bulk 15 представлены в формате object.
# Переведём время из формата строки в формат времени
bulk_columns = df_bulk_time.columns[1:]
for column in bulk_columns:
df_bulk_time[column] = pd.to_datetime(df_bulk_time[column])
df_bulk_time.info()
df_bulk_time.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3129 entries, 0 to 3128 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3129 non-null int64 1 Bulk 1 252 non-null datetime64[ns] 2 Bulk 2 22 non-null datetime64[ns] 3 Bulk 3 1298 non-null datetime64[ns] 4 Bulk 4 1014 non-null datetime64[ns] 5 Bulk 5 77 non-null datetime64[ns] 6 Bulk 6 576 non-null datetime64[ns] 7 Bulk 7 25 non-null datetime64[ns] 8 Bulk 8 1 non-null datetime64[ns] 9 Bulk 9 19 non-null datetime64[ns] 10 Bulk 10 176 non-null datetime64[ns] 11 Bulk 11 177 non-null datetime64[ns] 12 Bulk 12 2450 non-null datetime64[ns] 13 Bulk 13 18 non-null datetime64[ns] 14 Bulk 14 2806 non-null datetime64[ns] 15 Bulk 15 2248 non-null datetime64[ns] dtypes: datetime64[ns](15), int64(1) memory usage: 391.2 KB
| key | Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | NaT | NaT | NaT | 2019-05-03 11:28:48 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:24:31 | NaT | 2019-05-03 11:14:50 | 2019-05-03 11:10:43 |
| 1 | 2 | NaT | NaT | NaT | 2019-05-03 11:36:50 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:53:30 | NaT | 2019-05-03 11:48:37 | 2019-05-03 11:44:39 |
| 2 | 3 | NaT | NaT | NaT | 2019-05-03 12:32:39 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:27:13 | NaT | 2019-05-03 12:21:01 | 2019-05-03 12:16:16 |
| 3 | 4 | NaT | NaT | NaT | 2019-05-03 12:43:22 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:58:00 | NaT | 2019-05-03 12:51:11 | 2019-05-03 12:46:36 |
| 4 | 5 | NaT | NaT | NaT | 2019-05-03 13:30:47 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 13:30:47 | NaT | 2019-05-03 13:34:12 | 2019-05-03 13:30:47 |
df_gas
Данные предоставляют информацию о продувке сплава газом. Набор данных содержит 3239 строк и 2 столбца: key и Газ 1. Столбец key представляет собой номер партии, а столбец Газ 1 содержит числовые значения, предположительно, отражающие объем газа для продувки. Оба столбца имеют правильные типы данных.
df_temp
Данные из файла содержат информацию о замерах температуры в разных моментах времени. В наборе данных представлено 18092 строк и 3 столбца:
key - номер партии,Время замера - дата и время проведения замера,Температура - числовое значение температуры.Столбец key представляет собой уникальный идентификатор, который, вероятно, используется для связи с другими таблицами. Столбец Время замера содержит информацию о точной дате и времени проведения замера температуры. Столбец Температура содержит числовые значения, предположительно, отражающие измеренную температуру.
Из общей информации о датафрейме видно, что столбец Температура содержит недостающие значения (NaN) в некоторых строках (14665 непустых значений из 18092). Столбец Время замера имеет тип данных object, что требует преобразования в тип datetime для дальнейшего анализа.
Вывод: данные готовы для анализа, но требуется работа с пропущенными значениями и исправление типа данных столбца Время замера.
# Переведём время из формата строки в формат времени
df_temp['Время замера'] = pd.to_datetime(df_temp['Время замера'])
df_temp['Время замера'] = pd.to_datetime(df_temp['Время замера'])
df_temp.info()
df_temp.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18092 entries, 0 to 18091 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 18092 non-null int64 1 Время замера 18092 non-null datetime64[ns] 2 Температура 14665 non-null float64 dtypes: datetime64[ns](1), float64(1), int64(1) memory usage: 424.2 KB
| key | Время замера | Температура | |
|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:04 | 1571.0 |
| 1 | 1 | 2019-05-03 11:07:18 | 1604.0 |
| 2 | 1 | 2019-05-03 11:11:34 | 1618.0 |
| 3 | 1 | 2019-05-03 11:18:04 | 1601.0 |
| 4 | 1 | 2019-05-03 11:25:59 | 1606.0 |
df_wire
Данные о проволочных материалах содержат информацию о проволоках 1-9 и их объемах. Всего в датафрейме 3081 строк и 10 столбцов. В столбцах Wire 1 ... Wire 9 содержится информация об объеме проволоки для каждого наблюдения. В столбце key указан уникальный идентификатор.
В данных имеется некоторое количество пропущенных значений в столбцах Wire 1 ... Wire 9.
df_wire_time
Данные о проволочных материалах содержат информацию о времени для каждого типа проволоки 1-9. Всего в датафрейме 3081 строк и 10 столбцов. В столбцах Wire 1 ... Wire 9 содержится информация о времени для каждого типа проволоки для каждого наблюдения. В столбце key указан уникальный идентификатор. В данных имеется некоторое количество пропущенных значений в столбцах Wire 1 ... Wire 9.
Тип данных для столбцов Wire 1 ... Wire 9 является объектом, что указывает на то, что данные записаны в строковом формате, а не в формате даты и времени.
# Переведём время из формата строки в формат времени
wire_columns = df_wire_time.columns[1:]
for column in wire_columns:
df_wire_time[column] = pd.to_datetime(df_wire_time[column])
df_wire_time.info()
df_wire_time.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3081 entries, 0 to 3080 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 key 3081 non-null int64 1 Wire 1 3055 non-null datetime64[ns] 2 Wire 2 1079 non-null datetime64[ns] 3 Wire 3 63 non-null datetime64[ns] 4 Wire 4 14 non-null datetime64[ns] 5 Wire 5 1 non-null datetime64[ns] 6 Wire 6 73 non-null datetime64[ns] 7 Wire 7 11 non-null datetime64[ns] 8 Wire 8 19 non-null datetime64[ns] 9 Wire 9 29 non-null datetime64[ns] dtypes: datetime64[ns](9), int64(1) memory usage: 240.8 KB
| key | Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2019-05-03 11:06:19 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 1 | 2 | 2019-05-03 11:36:50 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 2 | 3 | 2019-05-03 12:11:46 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 3 | 4 | 2019-05-03 12:43:22 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 4 | 5 | 2019-05-03 13:20:44 | 2019-05-03 13:15:34 | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
df_arc — данные об электродах.¶# Функция вывода уникальных значений столбцов
def get_unique_sorted(df):
# Итерируемся по каждому столбцу в датафрейме
for column in df.columns:
# Получаем уникальные значения столбца и сортируем их
unique_values = df[column].sort_values().unique()
# Выводим название столбца в кавычках и добавляем двоеточие
print('Уникальные значения столбца', f"'{column}':")
# Выводим уникальные значения столбца
print(unique_values)
# Выводим количество уникальных значений
print(f'Количество уникальных значений: {len(unique_values)}')
# Делаем разделительную линию
print('*'*100)
get_unique_sorted(df_arc)
Уникальные значения столбца 'key': [ 1 2 3 ... 3239 3240 3241] Количество уникальных значений: 3214 **************************************************************************************************** Уникальные значения столбца 'Начало нагрева дугой': ['2019-05-03T11:02:14.000000000' '2019-05-03T11:07:28.000000000' '2019-05-03T11:11:44.000000000' ... '2019-09-06T17:06:48.000000000' '2019-09-06T17:21:58.000000000' '2019-09-06T17:24:54.000000000'] Количество уникальных значений: 14876 **************************************************************************************************** Уникальные значения столбца 'Конец нагрева дугой': ['2019-05-03T11:06:02.000000000' '2019-05-03T11:10:33.000000000' '2019-05-03T11:14:36.000000000' ... '2019-09-06T17:09:52.000000000' '2019-09-06T17:22:55.000000000' '2019-09-06T17:26:15.000000000'] Количество уникальных значений: 14876 **************************************************************************************************** Уникальные значения столбца 'Активная мощность': [0.22312 0.223238 0.223895 ... 1.444904 1.458773 1.463773] Количество уникальных значений: 13846 **************************************************************************************************** Уникальные значения столбца 'Реактивная мощность': [-7.15479924e+02 1.53777000e-01 1.53921000e-01 ... 1.22306300e+00 1.25862800e+00 1.27028400e+00] Количество уникальных значений: 14707 ****************************************************************************************************
# Сводная статистика для всех столбцов в `df_arc`
df_arc.describe(include='all', datetime_is_numeric=True)
| key | Начало нагрева дугой | Конец нагрева дугой | Активная мощность | Реактивная мощность | |
|---|---|---|---|---|---|
| count | 14876.000000 | 14876 | 14876 | 14876.000000 | 14876.000000 |
| mean | 1615.220422 | 2019-07-05 12:25:51.921081088 | 2019-07-05 12:28:43.592027392 | 0.662752 | 0.438986 |
| min | 1.000000 | 2019-05-03 11:02:14 | 2019-05-03 11:06:02 | 0.223120 | -715.479924 |
| 25% | 806.000000 | 2019-06-03 23:18:23.249999872 | 2019-06-03 23:21:35 | 0.467115 | 0.337175 |
| 50% | 1617.000000 | 2019-07-03 01:31:26.500000 | 2019-07-03 01:35:13 | 0.599587 | 0.441639 |
| 75% | 2429.000000 | 2019-08-07 22:52:20.750000128 | 2019-08-07 22:56:47 | 0.830070 | 0.608201 |
| max | 3241.000000 | 2019-09-06 17:24:54 | 2019-09-06 17:26:15 | 1.463773 | 1.270284 |
| std | 934.571502 | NaN | NaN | 0.258885 | 5.873485 |
В df_arc записи с 2019-05-03 11:02:14 по 2019-09-06 17:26:15, количество партий - 3214 (видимо проводилась повторная обработка и по этому количество записей - 14876). Активная мощность от 0.223120 до 1.463773. Реактивная мощность от -715.479924 до 1.270284. Отрицательное значение одно и выглядит не логично, имеет более высокий порядок(e+02, остальные значения в интервале e-01 - e+00).
Активная мощность и реактивная мощность - это две измеряемые характеристики электрической энергии, которые используются для оценки и контроля энергопотребления в системах электроснабжения.
Активная мощность - это мощность, которая фактически потребляется или генерируется в электрической системе. Она измеряется в ваттах (W) и показывает энергию, которая фактически преобразуется в полезную работу, такую как электрическое освещение, движение или нагружение электрического оборудования.
Реактивная мощность - это мощность, которая обрабатывается или переносится электрическим оборудованием, но не преобразуется в полезную работу. Она измеряется в варах (VAR) и обычно связана с электрическими компонентами, такими как конденсаторы или катушки индуктивности, которые потребляют или генерируют реактивную мощность. Реактивная мощность не используется непосредственно для работы устройств, но она влияет на эффективность и стабильность работы системы электроснабжения.
Обычно активная и реактивная мощности измеряются и контролируются для оптимизации энергопотребления, улучшения энергоэффективности и снижения затрат на электроэнергию.
# Среднее количество обработок для одной партии
print('Среднее количество обработок для одной партии: ', round(
df_arc['key'].count() / df_arc['key'].nunique(), 2))
Среднее количество обработок для одной партии: 4.63
# Распределение партий по количеству обработок
df_count = df_arc.groupby('key')['key'].count().value_counts().reset_index()
df_count.columns = ['кол-во обработок', 'кол-во партий']
total_batches = df_count['кол-во партий'].sum()
df_count['% кол-ва партий'] = df_count['кол-во партий'] / total_batches * 100
plt.figure(figsize=(15, 5))
plt.barh(df_count['кол-во обработок'], df_count['кол-во партий'])
plt.xlabel('кол-во партий', fontsize=15, color='DarkSlateGray')
plt.ylabel('кол-во обработок', fontsize=15, color='DarkSlateGray')
plt.title('График количества обработок и партий',
fontsize=20, color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
plt.show()
df_count
| кол-во обработок | кол-во партий | % кол-ва партий | |
|---|---|---|---|
| 0 | 4 | 892 | 27.753578 |
| 1 | 5 | 759 | 23.615432 |
| 2 | 3 | 520 | 16.179216 |
| 3 | 6 | 490 | 15.245800 |
| 4 | 7 | 205 | 6.378345 |
| 5 | 2 | 174 | 5.413815 |
| 6 | 8 | 84 | 2.613566 |
| 7 | 1 | 39 | 1.213441 |
| 8 | 9 | 28 | 0.871189 |
| 9 | 10 | 9 | 0.280025 |
| 10 | 11 | 5 | 0.155569 |
| 11 | 12 | 3 | 0.093342 |
| 12 | 13 | 2 | 0.062228 |
| 13 | 15 | 2 | 0.062228 |
| 14 | 14 | 1 | 0.031114 |
| 15 | 16 | 1 | 0.031114 |
Анализируя данные по количеству обработок и партий на сталелитейном заводе, можно сделать следующие наблюдения:
Таким образом, основная часть партий на сталелитейном заводе проходит 4 или 5 обработок, а среднее количество обработок для одной партии : 4.63.
# Посмотрим как мощности распределены по датам
plt.figure(figsize=(15, 5))
plt.plot(df_arc['Начало нагрева дугой'],
df_arc['Активная мощность'])
plt.title('Линейный график активной мощности по началу нагрева дугой',
fontsize=15, color='DarkSlateGray')
plt.xlabel('Начало нагрева дугой',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Активная мощность',
fontsize=12,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
plt.show()
Лицезреем в районе 19-07-15 пустоту
# Посмотрим на пустоту
px.line(df_arc, x='Конец нагрева дугой',
y='Активная мощность', width=900, height=400).show()
px.line(df_arc, x='Начало нагрева дугой',
y='Активная мощность', width=900, height=400).show()
Выделяется промежуток с 13 по 18 июля, в течение которого отсутствуют записи в данных.(Возможно была какая-то неисправность)
# Записи с отрицательной реактивной мощностью
df_arc.query('`Реактивная мощность` < 0')
| key | Начало нагрева дугой | Конец нагрева дугой | Активная мощность | Реактивная мощность | |
|---|---|---|---|---|---|
| 9780 | 2116 | 2019-07-28 02:22:08 | 2019-07-28 02:23:57 | 0.705344 | -715.479924 |
Для корректной замены рассчитаем корреляцию между активной и реактивной мощностью, исключив данное значение. (логика подсказывает, что корреляция будет высокой)
'''
Отфильтруем реактивную мощность меньше нуля и для остальных значений
вычислим корреляцию между активной и реактивной мощностью
'''
df_arc.query('`Реактивная мощность` > 0')[
['Активная мощность', 'Реактивная мощность']].corr().round(2)
| Активная мощность | Реактивная мощность | |
|---|---|---|
| Активная мощность | 1.00 | 0.97 |
| Реактивная мощность | 0.97 | 1.00 |
Между активной и реактивной мощностью высокая положительная корреляция.
# Расчитаем реактивную мощность чтоб заменить отрицательное значение
df_arc.loc[df_arc['Реактивная мощность'] < 0,
'Реактивная мощность'] = df_arc['Активная мощность'] * (df_arc.query('`Реактивная мощность` > 0')['Реактивная мощность'] /
df_arc.query('`Реактивная мощность` > 0')['Активная мощность']).mean().round(6)
px.line(df_arc, x='Конец нагрева дугой',
y='Реактивная мощность', width=900, height=400).show()
px.line(df_arc, x='Начало нагрева дугой',
y='Реактивная мощность', width=900, height=400).show()
Теперь явно выбивающихся из общей картины значений нет.
Проверим это утверждение.
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
# Гистограмма для 'Активной мощности'
sns.histplot(data=df_arc['Активная мощность'],
color='green', alpha=0.8, label='Активная мощность')
# Гистограмма для 'Реактивной мощности'
sns.histplot(data=df_arc['Реактивная мощность'],
color='red', alpha=0.5, label='Реактивная мощность')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота', fontsize=12, color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'Активной мощности' и 'Реактивной мощности'
sns.boxplot(data=df_arc[['Активная мощность', 'Реактивная мощность']],
orient='horizontal', palette=['green', 'red'])
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
plt.xlabel('Мощность', fontsize=12, color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение мощности (активной и реактивной)',
fontsize=15, color='DarkSlateGray')
plt.show()
Действительно, аномалий не наблюдается.
Время окончания процесса должно быть больше времени начала, проверим это условие.
(df_arc['Конец нагрева дугой'] - df_arc['Начало нагрева дугой']).describe()
count 14876 mean 0 days 00:02:51.670946490 std 0 days 00:01:38.186802680 min 0 days 00:00:11 25% 0 days 00:01:47 50% 0 days 00:02:27 75% 0 days 00:03:34 max 0 days 00:15:07 dtype: object
Процесс длится около 3,5 минут, минимальная продолжительность обработки составляет 11 секунд, максимальная - чуть больше 15 минут.
Нам стало известно,что мощности (активная и реактивная) измерены в одинаковых величинах.
Помимо активной и реактивной мощности в электротехнике широко используется понятие полной мощности. Активная, реактивная и полная мощности связаны следующим соотношением: $$S=\sqrt{P^2+Q^2}$$ Где:
# Добавим столбец `Полная мощность`
df_arc['Полная мощность'] = np.sqrt(
df_arc['Активная мощность']**2 + df_arc['Реактивная мощность']**2)
Мы можем вычеслить продолжительность дуги в секундах (разность времени окончания дуги и времени ее начала) и умножив на полную мощность получить потребляемую энергию.
# Вычесляем продолжительность дуги
df_arc['Продолжительность дуги'] = (
df_arc['Конец нагрева дугой'] - df_arc['Начало нагрева дугой']).astype('timedelta64[s]')
# Вычесляем количество потребляемой энергии
df_arc['Потребляемая энергия'] = df_arc['Полная мощность'] * \
df_arc['Продолжительность дуги']
df_arc
| key | Начало нагрева дугой | Конец нагрева дугой | Активная мощность | Реактивная мощность | Полная мощность | Продолжительность дуги | Потребляемая энергия | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:14 | 2019-05-03 11:06:02 | 0.305130 | 0.211253 | 0.371123 | 228.0 | 84.616003 |
| 1 | 1 | 2019-05-03 11:07:28 | 2019-05-03 11:10:33 | 0.765658 | 0.477438 | 0.902319 | 185.0 | 166.928978 |
| 2 | 1 | 2019-05-03 11:11:44 | 2019-05-03 11:14:36 | 0.580313 | 0.430460 | 0.722536 | 172.0 | 124.276277 |
| 3 | 1 | 2019-05-03 11:18:14 | 2019-05-03 11:24:19 | 0.518496 | 0.379979 | 0.642824 | 365.0 | 234.630603 |
| 4 | 1 | 2019-05-03 11:26:09 | 2019-05-03 11:28:37 | 0.867133 | 0.643691 | 1.079934 | 148.0 | 159.830252 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 14871 | 3241 | 2019-09-06 16:49:05 | 2019-09-06 16:51:42 | 0.439735 | 0.299579 | 0.532085 | 157.0 | 83.537345 |
| 14872 | 3241 | 2019-09-06 16:55:11 | 2019-09-06 16:58:11 | 0.646498 | 0.458240 | 0.792429 | 180.0 | 142.637202 |
| 14873 | 3241 | 2019-09-06 17:06:48 | 2019-09-06 17:09:52 | 1.039726 | 0.769302 | 1.293389 | 184.0 | 237.983620 |
| 14874 | 3241 | 2019-09-06 17:21:58 | 2019-09-06 17:22:55 | 0.530267 | 0.361543 | 0.641792 | 57.0 | 36.582120 |
| 14875 | 3241 | 2019-09-06 17:24:54 | 2019-09-06 17:26:15 | 0.389057 | 0.251347 | 0.463185 | 81.0 | 37.518013 |
14876 rows × 8 columns
Агрегируем полученные данные по ключу. Оставим на данном этапе временную составляющую обработки электродами, взяв начальное время и конечное время обработки, добавим столбец с общим количеством этапов обработки.
agg_func_arc = {
'key': 'count',
'Начало нагрева дугой': 'first',
'Конец нагрева дугой': 'last',
'Активная мощность': 'sum',
'Реактивная мощность': 'sum',
'Полная мощность': 'sum',
'Продолжительность дуги': 'sum',
'Потребляемая энергия': 'sum'
}
df_arc_key = df_arc.groupby('key').agg(agg_func_arc).rename(columns={'key':'количество этапов обработки'})
df_arc_key
| количество этапов обработки | Начало нагрева дугой | Конец нагрева дугой | Активная мощность | Реактивная мощность | Полная мощность | Продолжительность дуги | Потребляемая энергия | |
|---|---|---|---|---|---|---|---|---|
| key | ||||||||
| 1 | 5 | 2019-05-03 11:02:14 | 2019-05-03 11:28:37 | 3.036730 | 2.142821 | 3.718736 | 1098.0 | 770.282114 |
| 2 | 4 | 2019-05-03 11:34:14 | 2019-05-03 11:53:18 | 2.139408 | 1.453357 | 2.588349 | 811.0 | 481.760005 |
| 3 | 5 | 2019-05-03 12:06:54 | 2019-05-03 12:32:19 | 4.063641 | 2.937457 | 5.019223 | 655.0 | 722.837668 |
| 4 | 4 | 2019-05-03 12:39:37 | 2019-05-03 12:57:50 | 2.706489 | 2.056992 | 3.400038 | 741.0 | 683.455597 |
| 5 | 4 | 2019-05-03 13:11:13 | 2019-05-03 13:33:55 | 2.252950 | 1.687991 | 2.816980 | 869.0 | 512.169934 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 5 | 2019-09-06 11:31:25 | 2019-09-06 11:53:55 | 2.541872 | 2.025417 | 3.250657 | 909.0 | 630.503534 |
| 3238 | 3 | 2019-09-06 12:16:25 | 2019-09-06 12:31:35 | 1.374821 | 1.038103 | 1.723937 | 546.0 | 286.052252 |
| 3239 | 8 | 2019-09-06 14:17:00 | 2019-09-06 15:05:50 | 4.848005 | 3.541541 | 6.014480 | 1216.0 | 941.538764 |
| 3240 | 5 | 2019-09-06 15:25:31 | 2019-09-06 16:24:15 | 3.317679 | 2.373552 | 4.082920 | 839.0 | 657.439848 |
| 3241 | 5 | 2019-09-06 16:49:05 | 2019-09-06 17:26:15 | 3.045283 | 2.140011 | 3.722880 | 659.0 | 538.258300 |
3214 rows × 8 columns
# Приведём таблицу к общепринятому виду
df_arc_key.columns = ['count_arc', 'start_arc', 'end_arc', 'active_power',
'reactive_power', 'apparent_power', 'arc_duration', 'energy_consumption']
df_arc_key
| count_arc | start_arc | end_arc | active_power | reactive_power | apparent_power | arc_duration | energy_consumption | |
|---|---|---|---|---|---|---|---|---|
| key | ||||||||
| 1 | 5 | 2019-05-03 11:02:14 | 2019-05-03 11:28:37 | 3.036730 | 2.142821 | 3.718736 | 1098.0 | 770.282114 |
| 2 | 4 | 2019-05-03 11:34:14 | 2019-05-03 11:53:18 | 2.139408 | 1.453357 | 2.588349 | 811.0 | 481.760005 |
| 3 | 5 | 2019-05-03 12:06:54 | 2019-05-03 12:32:19 | 4.063641 | 2.937457 | 5.019223 | 655.0 | 722.837668 |
| 4 | 4 | 2019-05-03 12:39:37 | 2019-05-03 12:57:50 | 2.706489 | 2.056992 | 3.400038 | 741.0 | 683.455597 |
| 5 | 4 | 2019-05-03 13:11:13 | 2019-05-03 13:33:55 | 2.252950 | 1.687991 | 2.816980 | 869.0 | 512.169934 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 5 | 2019-09-06 11:31:25 | 2019-09-06 11:53:55 | 2.541872 | 2.025417 | 3.250657 | 909.0 | 630.503534 |
| 3238 | 3 | 2019-09-06 12:16:25 | 2019-09-06 12:31:35 | 1.374821 | 1.038103 | 1.723937 | 546.0 | 286.052252 |
| 3239 | 8 | 2019-09-06 14:17:00 | 2019-09-06 15:05:50 | 4.848005 | 3.541541 | 6.014480 | 1216.0 | 941.538764 |
| 3240 | 5 | 2019-09-06 15:25:31 | 2019-09-06 16:24:15 | 3.317679 | 2.373552 | 4.082920 | 839.0 | 657.439848 |
| 3241 | 5 | 2019-09-06 16:49:05 | 2019-09-06 17:26:15 | 3.045283 | 2.140011 | 3.722880 | 659.0 | 538.258300 |
3214 rows × 8 columns
df_bulk — данные о подаче сыпучих материалов (объём)¶Проверим столбец key на уникальность значений и установим в качестве индекса датафрейма.
# Проверим столбец `key` на уникальность значений
df_bulk['key'].duplicated().sum()
0
# Столбец `key` установим в качестве индекса датафрейма
df_bulk = df_bulk.set_index('key')
# Сводная статистика для всех столбцов в `df_bulk`
df_bulk.describe()
| Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 252.000000 | 22.000000 | 1298.000000 | 1014.000000 | 77.000000 | 576.000000 | 25.000000 | 1.0 | 19.000000 | 176.000000 | 177.000000 | 2450.000000 | 18.000000 | 2806.000000 | 2248.000000 |
| mean | 39.242063 | 253.045455 | 113.879045 | 104.394477 | 107.025974 | 118.925347 | 305.600000 | 49.0 | 76.315789 | 83.284091 | 76.819209 | 260.471020 | 181.111111 | 170.284747 | 160.513345 |
| std | 18.277654 | 21.180578 | 75.483494 | 48.184126 | 81.790646 | 72.057776 | 191.022904 | NaN | 21.720581 | 26.060347 | 59.655365 | 120.649269 | 46.088009 | 65.868652 | 51.765319 |
| min | 10.000000 | 228.000000 | 6.000000 | 12.000000 | 11.000000 | 17.000000 | 47.000000 | 49.0 | 63.000000 | 24.000000 | 8.000000 | 53.000000 | 151.000000 | 16.000000 | 1.000000 |
| 25% | 27.000000 | 242.000000 | 58.000000 | 72.000000 | 70.000000 | 69.750000 | 155.000000 | 49.0 | 66.000000 | 64.000000 | 25.000000 | 204.000000 | 153.250000 | 119.000000 | 105.000000 |
| 50% | 31.000000 | 251.500000 | 97.500000 | 102.000000 | 86.000000 | 100.000000 | 298.000000 | 49.0 | 68.000000 | 86.500000 | 64.000000 | 208.000000 | 155.500000 | 151.000000 | 160.000000 |
| 75% | 46.000000 | 257.750000 | 152.000000 | 133.000000 | 132.000000 | 157.000000 | 406.000000 | 49.0 | 70.500000 | 102.000000 | 106.000000 | 316.000000 | 203.500000 | 205.750000 | 205.000000 |
| max | 185.000000 | 325.000000 | 454.000000 | 281.000000 | 603.000000 | 503.000000 | 772.000000 | 49.0 | 147.000000 | 159.000000 | 313.000000 | 1849.000000 | 305.000000 | 636.000000 | 405.000000 |
Объём подачи сыпучих материалов различается в зависимости от партии. Для Bulk 8 только одна партия была зарегистрирована, для Bulk 14 - 2806.
Средние значения объёма подачи сыпучих материалов варьируются от 39.24 до 305.6.
Стандартное отклонение объёма подачи сыпучих материалов находится в диапазоне от 18.28 до 191.02, что говорит о разбросе данных внутри каждой партии.
Минимальные значения объёма подачи сыпучих материалов составляют от 6 до 49, а максимальные значения — от 185 до 772, что указывает на значительную разницу между минимальными и максимальными значениями внутри каждой партии.
Медианные значения объёма подачи сыпучих материалов колеблются от 31 до 298.
Значения в первом квартиле (25%) варьируются от 27 до 406, а в третьем квартиле (75%) — от 46 до 205, что указывает на наличие большого разброса данных и значительные отличия между нижними и верхними квартилями.
В целом, данные о подаче сыпучих материалов (объем) показывают значительную вариабельность внутри каждой партии и различия между партиями, что может быть связано с широким ассортиментом выпускаемой продукции (различными условиями производства и требованиями процесса).
# Посмотрим на уникальную запись с добавлением `Bulk 8`
df_bulk[df_bulk['Bulk 8'].notna()]
| Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||
| 1786 | NaN | NaN | 123.0 | NaN | NaN | NaN | NaN | 49.0 | 147.0 | NaN | NaN | NaN | NaN | NaN | NaN |
# Визуализируем распределения по колонкам `df_bulk`
plt.figure(figsize=(15, 10))
sns.boxplot(data=df_bulk, orient='horizontal')
plt.xlabel('Объем подачи сыпучих материалов',
fontsize=12, color='DarkSlateGray')
plt.ylabel('Материал', fontsize=12, color='DarkSlateGray')
plt.title('Boxplot подачи сыпучих материалов',
fontsize=15, color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
Всё выглядит хорошо, но у Bulk 5 и Bulk 12 есть еденичные случаи, когда значения сильно выбиваются из общей картины.
Сделаем график с возможностью масштабирования. (чтоб можно было визуально оценить интересующие нас моменты)
fig = px.box(df_bulk)
fig.update_layout(
width=800,
height=500,
xaxis=dict(title='Материал'),
yaxis=dict(title='Объем подачи сыпучих материалов'),
title='Boxplot подачи сыпучих материалов'
)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
fig.show()
# Посмотрим уникальные значения по колонкам `df_bulk`
get_unique_sorted(df_bulk)
Уникальные значения столбца 'Bulk 1': [ 10. 16. 19. 21. 22. 24. 26. 27. 28. 29. 30. 31. 32. 33. 34. 36. 37. 38. 39. 41. 43. 44. 46. 47. 48. 50. 51. 52. 54. 58. 60. 62. 63. 65. 67. 69. 73. 74. 75. 78. 83. 90. 92. 94. 104. 118. 185. nan] Количество уникальных значений: 48 **************************************************************************************************** Уникальные значения столбца 'Bulk 2': [228. 232. 233. 236. 242. 246. 247. 248. 249. 254. 257. 258. 270. 282. 325. nan] Количество уникальных значений: 16 **************************************************************************************************** Уникальные значения столбца 'Bulk 3': [ 6. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 34. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 193. 194. 196. 197. 198. 199. 200. 201. 202. 203. 204. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 220. 221. 222. 224. 225. 226. 227. 230. 231. 232. 234. 235. 236. 237. 239. 240. 242. 243. 244. 245. 246. 248. 249. 250. 251. 252. 255. 256. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 272. 278. 280. 282. 284. 285. 286. 289. 290. 291. 292. 294. 295. 297. 298. 300. 301. 302. 303. 304. 305. 308. 310. 315. 316. 323. 324. 334. 335. 341. 342. 343. 349. 352. 353. 354. 356. 358. 372. 379. 395. 405. 421. 445. 454. nan] Количество уникальных значений: 279 **************************************************************************************************** Уникальные значения столбца 'Bulk 4': [ 12. 13. 16. 17. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 30. 31. 32. 33. 34. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 170. 171. 172. 174. 175. 176. 177. 179. 180. 181. 182. 184. 185. 186. 188. 189. 190. 192. 193. 194. 195. 196. 198. 199. 200. 201. 202. 203. 205. 206. 208. 212. 214. 218. 220. 221. 222. 223. 224. 228. 231. 233. 238. 239. 241. 242. 243. 255. 256. 258. 265. 271. 276. 281. nan] Количество уникальных значений: 207 **************************************************************************************************** Уникальные значения столбца 'Bulk 5': [ 11. 18. 19. 22. 23. 28. 31. 33. 34. 42. 43. 50. 54. 61. 62. 70. 72. 73. 74. 77. 78. 79. 82. 83. 84. 86. 88. 90. 94. 102. 103. 104. 112. 120. 121. 122. 124. 128. 132. 136. 142. 144. 146. 148. 160. 180. 182. 184. 189. 197. 234. 242. 256. 293. 603. nan] Количество уникальных значений: 56 **************************************************************************************************** Уникальные значения столбца 'Bulk 6': [ 17. 23. 24. 25. 26. 27. 28. 30. 32. 34. 36. 37. 38. 40. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 114. 115. 116. 117. 118. 120. 121. 123. 124. 125. 126. 127. 128. 130. 131. 132. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 149. 150. 151. 152. 153. 155. 156. 157. 158. 159. 160. 162. 163. 164. 165. 166. 167. 168. 170. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 184. 185. 186. 187. 189. 190. 193. 194. 195. 196. 198. 202. 203. 204. 206. 208. 211. 212. 215. 218. 219. 220. 224. 226. 230. 231. 232. 234. 236. 240. 247. 250. 252. 254. 255. 261. 262. 264. 265. 266. 271. 272. 278. 280. 283. 284. 285. 288. 294. 295. 297. 321. 325. 326. 330. 350. 357. 359. 407. 414. 498. 503. nan] Количество уникальных значений: 206 **************************************************************************************************** Уникальные значения столбца 'Bulk 7': [ 47. 50. 54. 75. 92. 108. 155. 178. 208. 250. 252. 296. 298. 306. 340. 352. 392. 405. 406. 462. 506. 507. 553. 576. 772. nan] Количество уникальных значений: 26 **************************************************************************************************** Уникальные значения столбца 'Bulk 8': [49. nan] Количество уникальных значений: 2 **************************************************************************************************** Уникальные значения столбца 'Bulk 9': [ 63. 65. 66. 68. 70. 71. 74. 108. 111. 147. nan] Количество уникальных значений: 11 **************************************************************************************************** Уникальные значения столбца 'Bulk 10': [ 24. 26. 28. 30. 32. 34. 36. 38. 45. 46. 47. 49. 50. 52. 54. 55. 56. 57. 58. 59. 61. 62. 63. 64. 65. 67. 68. 70. 72. 73. 75. 76. 77. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 110. 111. 113. 114. 115. 117. 118. 122. 124. 125. 126. 128. 147. 154. 159. nan] Количество уникальных значений: 78 **************************************************************************************************** Уникальные значения столбца 'Bulk 11': [ 8. 14. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 28. 29. 31. 32. 34. 37. 38. 39. 40. 41. 42. 43. 46. 47. 48. 49. 50. 51. 53. 54. 56. 58. 62. 64. 66. 67. 68. 69. 71. 72. 74. 76. 81. 82. 83. 84. 87. 88. 89. 90. 91. 93. 94. 95. 96. 97. 98. 100. 102. 103. 104. 106. 107. 114. 120. 124. 126. 128. 131. 133. 136. 139. 140. 144. 146. 148. 151. 152. 154. 156. 158. 162. 172. 174. 176. 178. 185. 188. 198. 200. 204. 205. 206. 211. 225. 226. 242. 313. nan] Количество уникальных значений: 102 **************************************************************************************************** Уникальные значения столбца 'Bulk 12': [ 53. 54. 55. 56. 58. 88. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 115. 116. 123. 124. 125. 126. 127. 128. 129. 135. 136. 137. 138. 139. 143. 145. 152. 153. 154. 155. 156. 157. 158. 159. 160. 162. 165. 166. 167. 168. 170. 173. 174. 175. 176. 177. 178. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 191. 192. 195. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 232. 233. 234. 235. 236. 238. 239. 240. 242. 244. 247. 248. 250. 252. 253. 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 270. 271. 272. 273. 274. 275. 276. 277. 279. 280. 281. 282. 283. 284. 285. 286. 289. 290. 291. 296. 301. 302. 303. 304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 316. 319. 321. 322. 325. 329. 330. 331. 332. 333. 334. 335. 337. 338. 340. 342. 343. 345. 346. 349. 351. 352. 356. 358. 359. 360. 361. 362. 363. 364. 366. 368. 369. 370. 371. 372. 373. 375. 377. 378. 379. 380. 381. 382. 383. 385. 387. 388. 390. 391. 392. 393. 394. 395. 397. 399. 401. 402. 404. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415. 416. 417. 418. 419. 420. 421. 422. 423. 424. 425. 428. 430. 431. 432. 433. 434. 435. 436. 437. 439. 440. 441. 443. 444. 445. 446. 447. 452. 453. 456. 457. 458. 459. 460. 461. 462. 463. 464. 465. 466. 467. 468. 469. 470. 473. 474. 475. 478. 479. 481. 483. 484. 496. 501. 507. 508. 509. 510. 511. 512. 513. 514. 515. 516. 517. 518. 519. 520. 521. 522. 528. 542. 550. 558. 561. 562. 563. 564. 567. 570. 572. 583. 586. 597. 609. 616. 618. 620. 622. 624. 647. 666. 667. 668. 669. 671. 685. 733. 774. 775. 833. 853. 1849. nan] Количество уникальных значений: 332 **************************************************************************************************** Уникальные значения столбца 'Bulk 13': [151. 152. 153. 154. 155. 156. 157. 159. 202. 204. 206. 214. 278. 305. nan] Количество уникальных значений: 15 **************************************************************************************************** Уникальные значения столбца 'Bulk 14': [ 16. 18. 24. 29. 48. 52. 64. 68. 69. 72. 76. 78. 80. 82. 83. 84. 85. 86. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 173. 174. 176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253. 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271. 272. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 286. 288. 289. 290. 291. 292. 294. 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 315. 316. 317. 320. 321. 324. 325. 327. 330. 333. 336. 337. 342. 347. 348. 349. 351. 352. 353. 354. 356. 366. 371. 375. 376. 379. 385. 389. 390. 398. 400. 401. 402. 403. 405. 406. 407. 417. 428. 446. 450. 452. 504. 552. 601. 636. nan] Количество уникальных значений: 285 **************************************************************************************************** Уникальные значения столбца 'Bulk 15': [ 1. 10. 44. 50. 51. 52. 53. 54. 55. 56. 58. 59. 63. 64. 65. 69. 70. 72. 73. 74. 75. 85. 86. 87. 88. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 123. 124. 125. 126. 127. 128. 134. 136. 137. 145. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 163. 164. 166. 167. 168. 169. 170. 171. 172. 174. 175. 176. 177. 178. 179. 181. 183. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239. 240. 246. 251. 258. 260. 261. 262. 264. 266. 290. 295. 320. 322. 337. 405. nan] Количество уникальных значений: 157 ****************************************************************************************************
Похоже есть моменты, которые надо уточнить и согласовать с представителем заказчика (Список вопросов в конце предобработки данных)
Заменим пропуски на нули. Найдем сумму всех значений объема для каждой партии в качестве дополнительного признака (будем знать общий общем добавок в сплав) и приведём таблицу к общепринятому виду.
# Заполним значения NaN в DataFrame нулями
df_bulk = df_bulk.fillna(0)
# Рассчитаем сумму каждой строки и создим новый столбец с названием "bulk_sum"
df_bulk['bulk_sum'] = df_bulk.sum(axis=1)
# Заменим пробелы в названиях столбцов на нижнее подчеркивание и приведем их к нижнему регистру
df_bulk.columns = [x.replace(' ','_').lower() for x in df_bulk.columns]
df_bulk
| bulk_1 | bulk_2 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_7 | bulk_8 | bulk_9 | bulk_10 | bulk_11 | bulk_12 | bulk_13 | bulk_14 | bulk_15 | bulk_sum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | ||||||||||||||||
| 1 | 0.0 | 0.0 | 0.0 | 43.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 206.0 | 0.0 | 150.0 | 154.0 | 553.0 |
| 2 | 0.0 | 0.0 | 0.0 | 73.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 206.0 | 0.0 | 149.0 | 154.0 | 582.0 |
| 3 | 0.0 | 0.0 | 0.0 | 34.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 205.0 | 0.0 | 152.0 | 153.0 | 544.0 |
| 4 | 0.0 | 0.0 | 0.0 | 81.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 207.0 | 0.0 | 153.0 | 154.0 | 595.0 |
| 5 | 0.0 | 0.0 | 0.0 | 78.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 203.0 | 0.0 | 151.0 | 152.0 | 584.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 0.0 | 0.0 | 170.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 252.0 | 0.0 | 130.0 | 206.0 | 758.0 |
| 3238 | 0.0 | 0.0 | 126.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 254.0 | 0.0 | 108.0 | 106.0 | 594.0 |
| 3239 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 114.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 158.0 | 0.0 | 270.0 | 88.0 | 630.0 |
| 3240 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 26.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 192.0 | 54.0 | 272.0 |
| 3241 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 180.0 | 52.0 | 232.0 |
3129 rows × 16 columns
df_bulk_time — данные о подаче сыпучих материалов (время).¶# Проверим столбец `key` на уникальность значений
df_bulk_time['key'].duplicated().sum()
0
# Столбец `key` установим в качестве индекса датафрейма
df_bulk_time = df_bulk_time.set_index('key')
# Посмотрим уникальные значения по колонкам `df_bulk_time`
get_unique_sorted(df_bulk_time)
Уникальные значения столбца 'Bulk 1':
['2019-05-03T17:42:46.000000000' '2019-05-05T16:26:52.000000000'
'2019-05-05T17:18:36.000000000' '2019-05-05T18:32:50.000000000'
'2019-05-05T19:34:07.000000000' '2019-05-05T20:36:08.000000000'
'2019-05-05T21:41:07.000000000' '2019-05-05T22:07:40.000000000'
'2019-05-06T03:19:35.000000000' '2019-05-07T17:37:05.000000000'
'2019-05-07T22:54:29.000000000' '2019-05-08T01:46:58.000000000'
'2019-05-08T01:53:35.000000000' '2019-05-08T09:50:17.000000000'
'2019-05-08T14:10:47.000000000' '2019-05-08T18:38:10.000000000'
'2019-05-08T19:41:48.000000000' '2019-05-08T20:35:02.000000000'
'2019-05-09T00:10:01.000000000' '2019-05-09T14:14:29.000000000'
'2019-05-11T03:53:30.000000000' '2019-05-11T03:58:38.000000000'
'2019-05-11T06:21:38.000000000' '2019-05-11T07:01:15.000000000'
'2019-05-11T22:08:02.000000000' '2019-05-11T23:59:58.000000000'
'2019-05-12T04:52:28.000000000' '2019-05-13T03:00:10.000000000'
'2019-05-13T04:58:40.000000000' '2019-05-13T09:44:14.000000000'
'2019-05-14T01:20:24.000000000' '2019-05-14T02:50:28.000000000'
'2019-05-14T23:59:20.000000000' '2019-05-15T02:23:25.000000000'
'2019-05-15T04:41:54.000000000' '2019-05-15T10:09:57.000000000'
'2019-05-15T23:03:53.000000000' '2019-05-16T03:29:05.000000000'
'2019-05-16T10:36:08.000000000' '2019-05-16T12:14:14.000000000'
'2019-05-16T14:25:11.000000000' '2019-05-16T15:08:02.000000000'
'2019-05-17T04:13:52.000000000' '2019-05-17T15:34:35.000000000'
'2019-05-17T17:57:19.000000000' '2019-05-18T15:15:23.000000000'
'2019-05-18T16:12:19.000000000' '2019-05-18T18:58:58.000000000'
'2019-05-19T06:29:14.000000000' '2019-05-20T08:58:33.000000000'
'2019-05-20T17:28:11.000000000' '2019-05-20T18:24:16.000000000'
'2019-05-20T19:42:20.000000000' '2019-05-22T03:27:05.000000000'
'2019-05-22T04:41:37.000000000' '2019-05-22T09:05:48.000000000'
'2019-05-22T16:03:59.000000000' '2019-05-23T18:00:49.000000000'
'2019-05-24T07:25:20.000000000' '2019-05-28T16:09:48.000000000'
'2019-05-28T16:42:10.000000000' '2019-05-28T23:58:18.000000000'
'2019-05-29T11:07:59.000000000' '2019-05-29T16:02:25.000000000'
'2019-05-30T10:37:34.000000000' '2019-05-30T11:48:38.000000000'
'2019-06-03T04:07:28.000000000' '2019-06-03T05:19:12.000000000'
'2019-06-04T11:54:54.000000000' '2019-06-04T16:19:59.000000000'
'2019-06-05T10:49:37.000000000' '2019-06-05T20:16:32.000000000'
'2019-06-06T16:06:06.000000000' '2019-06-06T17:57:05.000000000'
'2019-06-06T18:43:52.000000000' '2019-06-06T19:19:59.000000000'
'2019-06-06T19:34:20.000000000' '2019-06-06T21:53:51.000000000'
'2019-06-08T06:48:21.000000000' '2019-06-08T21:02:27.000000000'
'2019-06-09T04:50:04.000000000' '2019-06-09T10:16:31.000000000'
'2019-06-09T10:37:29.000000000' '2019-06-10T04:29:05.000000000'
'2019-06-10T04:46:03.000000000' '2019-06-11T02:42:28.000000000'
'2019-06-11T03:44:35.000000000' '2019-06-11T06:08:17.000000000'
'2019-06-11T08:44:22.000000000' '2019-06-11T09:37:50.000000000'
'2019-06-11T10:10:21.000000000' '2019-06-11T11:34:49.000000000'
'2019-06-11T14:33:01.000000000' '2019-06-12T21:25:00.000000000'
'2019-06-13T09:26:07.000000000' '2019-06-13T10:36:33.000000000'
'2019-06-13T17:43:37.000000000' '2019-06-13T22:35:50.000000000'
'2019-06-14T12:55:26.000000000' '2019-06-15T14:00:58.000000000'
'2019-06-15T14:51:36.000000000' '2019-06-15T21:36:47.000000000'
'2019-06-16T08:33:03.000000000' '2019-06-17T18:44:57.000000000'
'2019-06-18T15:56:27.000000000' '2019-06-18T17:43:08.000000000'
'2019-06-19T05:36:13.000000000' '2019-06-19T05:41:40.000000000'
'2019-06-19T09:46:00.000000000' '2019-06-19T21:21:04.000000000'
'2019-06-19T21:59:12.000000000' '2019-06-20T21:50:46.000000000'
'2019-06-21T03:43:02.000000000' '2019-06-21T04:35:36.000000000'
'2019-06-21T09:54:58.000000000' '2019-06-21T10:33:10.000000000'
'2019-06-22T09:27:52.000000000' '2019-06-22T10:54:21.000000000'
'2019-06-22T13:44:53.000000000' '2019-06-24T01:30:36.000000000'
'2019-06-24T02:00:07.000000000' '2019-06-24T05:39:49.000000000'
'2019-06-24T07:17:15.000000000' '2019-06-24T10:14:40.000000000'
'2019-06-24T19:38:11.000000000' '2019-06-25T10:42:02.000000000'
'2019-06-25T11:36:24.000000000' '2019-06-25T15:52:45.000000000'
'2019-06-25T19:00:08.000000000' '2019-06-25T20:56:27.000000000'
'2019-06-27T15:18:47.000000000' '2019-06-27T15:58:49.000000000'
'2019-06-27T17:30:37.000000000' '2019-06-27T18:29:06.000000000'
'2019-06-27T22:02:59.000000000' '2019-06-27T22:21:02.000000000'
'2019-06-28T20:41:47.000000000' '2019-06-29T13:33:44.000000000'
'2019-06-30T05:57:45.000000000' '2019-06-30T06:40:22.000000000'
'2019-06-30T07:16:16.000000000' '2019-06-30T22:56:04.000000000'
'2019-07-03T01:08:30.000000000' '2019-07-03T04:08:37.000000000'
'2019-07-03T21:53:06.000000000' '2019-07-04T01:19:35.000000000'
'2019-07-04T07:49:47.000000000' '2019-07-04T08:40:29.000000000'
'2019-07-04T11:12:49.000000000' '2019-07-05T18:40:24.000000000'
'2019-07-05T21:08:57.000000000' '2019-07-05T21:24:17.000000000'
'2019-07-06T02:35:43.000000000' '2019-07-06T04:58:35.000000000'
'2019-07-06T10:32:44.000000000' '2019-07-06T12:30:47.000000000'
'2019-07-06T13:44:06.000000000' '2019-07-06T14:36:09.000000000'
'2019-07-06T15:51:32.000000000' '2019-07-06T17:17:20.000000000'
'2019-07-06T19:46:02.000000000' '2019-07-07T22:56:51.000000000'
'2019-07-09T04:01:44.000000000' '2019-07-09T21:25:01.000000000'
'2019-07-09T22:56:41.000000000' '2019-07-10T02:37:50.000000000'
'2019-07-10T16:29:50.000000000' '2019-07-11T15:05:01.000000000'
'2019-07-11T17:09:01.000000000' '2019-07-12T18:07:10.000000000'
'2019-07-12T18:53:16.000000000' '2019-07-12T19:29:31.000000000'
'2019-07-13T02:20:59.000000000' '2019-07-13T03:04:10.000000000'
'2019-07-23T03:06:03.000000000' '2019-07-23T10:17:34.000000000'
'2019-07-23T11:23:22.000000000' '2019-07-26T22:47:29.000000000'
'2019-07-27T04:17:21.000000000' '2019-07-27T05:43:35.000000000'
'2019-07-27T06:23:07.000000000' '2019-07-27T08:54:09.000000000'
'2019-07-27T16:34:15.000000000' '2019-07-27T18:12:32.000000000'
'2019-07-27T19:11:22.000000000' '2019-07-28T14:11:24.000000000'
'2019-07-29T16:07:12.000000000' '2019-07-30T19:39:56.000000000'
'2019-07-30T21:55:33.000000000' '2019-08-01T06:06:34.000000000'
'2019-08-02T23:33:57.000000000' '2019-08-03T04:25:56.000000000'
'2019-08-04T00:29:49.000000000' '2019-08-05T03:45:25.000000000'
'2019-08-05T10:10:45.000000000' '2019-08-08T23:30:42.000000000'
'2019-08-08T23:54:19.000000000' '2019-08-09T15:49:15.000000000'
'2019-08-09T17:36:23.000000000' '2019-08-11T02:14:16.000000000'
'2019-08-12T15:51:11.000000000' '2019-08-12T19:51:09.000000000'
'2019-08-12T23:16:11.000000000' '2019-08-13T01:07:34.000000000'
'2019-08-13T01:36:24.000000000' '2019-08-13T04:37:28.000000000'
'2019-08-13T07:29:19.000000000' '2019-08-13T09:18:00.000000000'
'2019-08-13T10:02:17.000000000' '2019-08-13T12:02:20.000000000'
'2019-08-14T03:53:23.000000000' '2019-08-14T07:06:34.000000000'
'2019-08-14T09:12:31.000000000' '2019-08-14T15:38:48.000000000'
'2019-08-14T16:26:27.000000000' '2019-08-14T18:07:06.000000000'
'2019-08-19T02:38:32.000000000' '2019-08-20T11:19:12.000000000'
'2019-08-20T16:14:40.000000000' '2019-08-21T15:21:58.000000000'
'2019-08-23T19:10:47.000000000' '2019-08-24T17:19:07.000000000'
'2019-08-25T01:05:20.000000000' '2019-08-25T20:57:58.000000000'
'2019-08-26T19:53:39.000000000' '2019-08-26T20:56:06.000000000'
'2019-08-26T21:27:45.000000000' '2019-08-27T00:25:48.000000000'
'2019-08-27T11:24:10.000000000' '2019-08-27T12:02:50.000000000'
'2019-08-27T16:12:26.000000000' '2019-08-28T08:33:28.000000000'
'2019-08-28T09:34:30.000000000' '2019-08-29T15:19:43.000000000'
'2019-08-30T12:07:47.000000000' '2019-08-31T18:55:07.000000000'
'2019-08-31T19:44:46.000000000' '2019-09-01T17:59:26.000000000'
'2019-09-01T18:45:39.000000000' '2019-09-01T19:08:00.000000000'
'2019-09-02T16:40:49.000000000' '2019-09-02T22:44:37.000000000'
'2019-09-03T09:24:41.000000000' '2019-09-04T04:48:19.000000000'
'2019-09-04T16:01:08.000000000' '2019-09-04T19:33:27.000000000'
'2019-09-04T20:39:01.000000000' '2019-09-04T20:51:19.000000000'
'2019-09-05T01:56:45.000000000' '2019-09-05T02:37:08.000000000'
'2019-09-05T06:10:59.000000000' '2019-09-05T09:11:32.000000000'
'NaT']
Количество уникальных значений: 253
****************************************************************************************************
Уникальные значения столбца 'Bulk 2':
['2019-05-07T15:39:35.000000000' '2019-05-07T16:16:34.000000000'
'2019-05-07T19:13:07.000000000' '2019-05-07T22:50:29.000000000'
'2019-05-08T01:46:58.000000000' '2019-05-08T01:53:35.000000000'
'2019-07-27T04:17:21.000000000' '2019-07-27T05:35:32.000000000'
'2019-07-27T06:23:07.000000000' '2019-07-27T08:54:09.000000000'
'2019-07-27T16:24:45.000000000' '2019-07-27T18:12:32.000000000'
'2019-07-27T19:11:22.000000000' '2019-08-12T21:42:20.000000000'
'2019-08-12T21:48:11.000000000' '2019-08-13T01:01:50.000000000'
'2019-08-13T02:46:14.000000000' '2019-08-13T04:37:28.000000000'
'2019-08-13T07:26:20.000000000' '2019-08-13T09:14:04.000000000'
'2019-08-13T09:55:35.000000000' '2019-08-13T11:47:39.000000000'
'NaT']
Количество уникальных значений: 23
****************************************************************************************************
Уникальные значения столбца 'Bulk 3':
['2019-05-03T20:40:25.000000000' '2019-05-04T04:25:29.000000000'
'2019-05-04T05:15:12.000000000' ... '2019-09-06T11:54:15.000000000'
'2019-09-06T12:26:52.000000000' 'NaT']
Количество уникальных значений: 1299
****************************************************************************************************
Уникальные значения столбца 'Bulk 4':
['2019-05-03T11:28:48.000000000' '2019-05-03T11:36:50.000000000'
'2019-05-03T12:32:39.000000000' ... '2019-09-05T03:12:35.000000000'
'2019-09-05T03:35:21.000000000' 'NaT']
Количество уникальных значений: 1015
****************************************************************************************************
Уникальные значения столбца 'Bulk 5':
['2019-05-07T15:19:17.000000000' '2019-05-07T16:46:56.000000000'
'2019-05-07T19:04:09.000000000' '2019-05-07T22:32:04.000000000'
'2019-05-08T01:25:11.000000000' '2019-05-08T01:53:35.000000000'
'2019-05-09T16:32:49.000000000' '2019-05-12T09:17:31.000000000'
'2019-05-12T09:50:33.000000000' '2019-05-12T10:54:36.000000000'
'2019-05-12T11:40:43.000000000' '2019-05-12T12:25:57.000000000'
'2019-05-12T13:06:07.000000000' '2019-05-12T15:10:42.000000000'
'2019-05-17T03:23:34.000000000' '2019-05-22T21:39:11.000000000'
'2019-06-09T06:13:50.000000000' '2019-06-16T00:51:58.000000000'
'2019-06-19T22:37:16.000000000' '2019-06-19T23:32:44.000000000'
'2019-07-04T13:44:47.000000000' '2019-07-04T14:00:31.000000000'
'2019-07-04T15:17:45.000000000' '2019-07-04T15:55:32.000000000'
'2019-07-04T16:48:21.000000000' '2019-07-04T18:15:19.000000000'
'2019-07-04T19:44:56.000000000' '2019-07-04T20:19:26.000000000'
'2019-07-04T20:58:58.000000000' '2019-07-07T18:35:15.000000000'
'2019-07-07T19:21:56.000000000' '2019-07-07T20:44:45.000000000'
'2019-07-07T22:15:12.000000000' '2019-07-09T14:36:47.000000000'
'2019-07-10T03:27:50.000000000' '2019-07-10T04:28:19.000000000'
'2019-07-25T16:00:24.000000000' '2019-07-25T16:58:47.000000000'
'2019-07-25T17:59:41.000000000' '2019-07-25T18:46:56.000000000'
'2019-07-25T19:36:40.000000000' '2019-07-25T20:44:18.000000000'
'2019-07-25T21:39:04.000000000' '2019-07-27T04:10:08.000000000'
'2019-07-27T05:21:58.000000000' '2019-07-27T06:45:47.000000000'
'2019-07-27T09:53:31.000000000' '2019-07-27T15:29:14.000000000'
'2019-07-27T18:12:32.000000000' '2019-07-27T19:11:22.000000000'
'2019-08-02T19:22:26.000000000' '2019-08-02T20:29:54.000000000'
'2019-08-02T21:29:51.000000000' '2019-08-12T21:20:02.000000000'
'2019-08-12T22:28:59.000000000' '2019-08-12T23:49:50.000000000'
'2019-08-13T01:45:01.000000000' '2019-08-13T04:23:23.000000000'
'2019-08-13T08:05:11.000000000' '2019-08-13T09:51:02.000000000'
'2019-08-13T11:04:10.000000000' '2019-08-17T04:59:15.000000000'
'2019-08-17T05:36:00.000000000' '2019-08-17T06:00:38.000000000'
'2019-08-24T15:41:14.000000000' '2019-09-01T08:31:55.000000000'
'2019-09-01T09:20:27.000000000' '2019-09-01T09:37:36.000000000'
'2019-09-02T06:33:55.000000000' '2019-09-02T07:28:42.000000000'
'2019-09-02T08:28:58.000000000' '2019-09-02T14:12:23.000000000'
'2019-09-02T15:08:19.000000000' '2019-09-02T16:19:39.000000000'
'2019-09-02T16:58:07.000000000' '2019-09-02T17:34:47.000000000'
'2019-09-02T18:16:52.000000000' 'NaT']
Количество уникальных значений: 78
****************************************************************************************************
Уникальные значения столбца 'Bulk 6':
['2019-05-03T19:09:15.000000000' '2019-05-03T21:36:10.000000000'
'2019-05-03T23:40:36.000000000' '2019-05-04T01:14:08.000000000'
'2019-05-04T07:30:19.000000000' '2019-05-05T02:28:41.000000000'
'2019-05-05T03:51:16.000000000' '2019-05-05T07:16:45.000000000'
'2019-05-05T07:34:21.000000000' '2019-05-05T08:08:36.000000000'
'2019-05-05T09:53:40.000000000' '2019-05-05T11:21:49.000000000'
'2019-05-05T16:45:30.000000000' '2019-05-05T17:12:58.000000000'
'2019-05-05T18:28:05.000000000' '2019-05-05T19:34:07.000000000'
'2019-05-05T20:36:08.000000000' '2019-05-05T21:08:45.000000000'
'2019-05-05T22:15:29.000000000' '2019-05-05T23:46:08.000000000'
'2019-05-06T01:12:46.000000000' '2019-05-06T09:04:53.000000000'
'2019-05-06T09:33:02.000000000' '2019-05-06T16:15:42.000000000'
'2019-05-06T17:43:26.000000000' '2019-05-06T18:17:34.000000000'
'2019-05-06T18:47:13.000000000' '2019-05-06T20:43:05.000000000'
'2019-05-06T21:23:02.000000000' '2019-05-06T21:56:33.000000000'
'2019-05-06T22:34:09.000000000' '2019-05-06T23:28:31.000000000'
'2019-05-06T23:57:16.000000000' '2019-05-07T09:53:19.000000000'
'2019-05-08T15:46:06.000000000' '2019-05-08T17:12:01.000000000'
'2019-05-09T09:03:16.000000000' '2019-05-09T10:21:02.000000000'
'2019-05-09T12:26:15.000000000' '2019-05-09T14:35:20.000000000'
'2019-05-09T17:29:56.000000000' '2019-05-09T21:08:08.000000000'
'2019-05-09T22:00:42.000000000' '2019-05-09T23:43:07.000000000'
'2019-05-10T00:48:12.000000000' '2019-05-10T03:33:42.000000000'
'2019-05-10T04:28:55.000000000' '2019-05-10T05:01:31.000000000'
'2019-05-10T05:41:51.000000000' '2019-05-10T06:33:28.000000000'
'2019-05-10T07:08:59.000000000' '2019-05-10T07:42:42.000000000'
'2019-05-10T14:54:38.000000000' '2019-05-10T15:40:53.000000000'
'2019-05-10T18:36:30.000000000' '2019-05-10T20:00:57.000000000'
'2019-05-10T21:05:18.000000000' '2019-05-11T04:47:58.000000000'
'2019-05-12T07:35:31.000000000' '2019-05-12T17:18:24.000000000'
'2019-05-12T23:19:20.000000000' '2019-05-13T13:38:42.000000000'
'2019-05-13T14:46:59.000000000' '2019-05-14T03:59:25.000000000'
'2019-05-14T08:12:04.000000000' '2019-05-14T08:51:49.000000000'
'2019-05-14T10:13:37.000000000' '2019-05-14T18:02:52.000000000'
'2019-05-15T09:16:29.000000000' '2019-05-15T10:22:01.000000000'
'2019-05-15T11:46:00.000000000' '2019-05-15T17:06:40.000000000'
'2019-05-15T18:59:06.000000000' '2019-05-15T20:16:11.000000000'
'2019-05-16T11:59:36.000000000' '2019-05-16T13:17:56.000000000'
'2019-05-16T19:45:42.000000000' '2019-05-16T20:15:55.000000000'
'2019-05-16T20:37:57.000000000' '2019-05-16T21:46:44.000000000'
'2019-05-17T04:09:53.000000000' '2019-05-17T05:29:07.000000000'
'2019-05-17T06:45:31.000000000' '2019-05-17T11:14:08.000000000'
'2019-05-18T23:22:24.000000000' '2019-05-19T00:02:46.000000000'
'2019-05-21T16:50:43.000000000' '2019-05-21T18:04:09.000000000'
'2019-05-22T16:03:59.000000000' '2019-05-22T17:46:40.000000000'
'2019-05-23T03:25:56.000000000' '2019-05-23T04:39:16.000000000'
'2019-05-23T09:06:35.000000000' '2019-05-23T11:04:16.000000000'
'2019-05-23T13:17:58.000000000' '2019-05-23T14:29:51.000000000'
'2019-05-24T03:42:25.000000000' '2019-05-24T07:30:55.000000000'
'2019-05-24T08:36:38.000000000' '2019-05-24T09:39:47.000000000'
'2019-05-24T10:10:29.000000000' '2019-05-24T10:48:47.000000000'
'2019-05-24T12:37:15.000000000' '2019-05-24T13:18:31.000000000'
'2019-05-24T14:02:20.000000000' '2019-05-25T04:18:28.000000000'
'2019-05-25T05:49:41.000000000' '2019-05-25T06:34:07.000000000'
'2019-05-25T13:59:07.000000000' '2019-05-26T08:53:54.000000000'
'2019-05-26T09:50:04.000000000' '2019-05-27T07:06:16.000000000'
'2019-05-27T07:59:46.000000000' '2019-05-27T08:45:17.000000000'
'2019-05-27T13:24:55.000000000' '2019-05-27T14:08:25.000000000'
'2019-05-27T14:31:36.000000000' '2019-05-28T17:37:46.000000000'
'2019-05-29T00:41:42.000000000' '2019-05-30T09:08:56.000000000'
'2019-05-30T10:25:59.000000000' '2019-05-30T11:43:40.000000000'
'2019-05-30T12:44:16.000000000' '2019-05-30T23:50:21.000000000'
'2019-05-31T01:07:11.000000000' '2019-05-31T02:02:19.000000000'
'2019-05-31T03:03:17.000000000' '2019-05-31T04:17:23.000000000'
'2019-06-03T06:13:41.000000000' '2019-06-03T10:15:34.000000000'
'2019-06-03T19:20:11.000000000' '2019-06-04T02:59:11.000000000'
'2019-06-04T06:03:31.000000000' '2019-06-04T07:14:53.000000000'
'2019-06-04T08:32:00.000000000' '2019-06-04T10:29:14.000000000'
'2019-06-04T10:59:37.000000000' '2019-06-05T12:22:39.000000000'
'2019-06-05T21:06:26.000000000' '2019-06-05T23:30:58.000000000'
'2019-06-06T00:32:17.000000000' '2019-06-06T01:21:24.000000000'
'2019-06-06T03:09:43.000000000' '2019-06-06T11:51:00.000000000'
'2019-06-07T02:24:34.000000000' '2019-06-07T08:47:00.000000000'
'2019-06-07T13:40:45.000000000' '2019-06-07T14:58:56.000000000'
'2019-06-07T16:14:36.000000000' '2019-06-07T16:37:52.000000000'
'2019-06-07T18:17:49.000000000' '2019-06-07T19:18:22.000000000'
'2019-06-07T21:59:11.000000000' '2019-06-07T23:55:43.000000000'
'2019-06-08T00:43:43.000000000' '2019-06-08T01:33:26.000000000'
'2019-06-08T12:40:52.000000000' '2019-06-08T20:24:04.000000000'
'2019-06-09T04:21:00.000000000' '2019-06-09T05:57:53.000000000'
'2019-06-09T08:41:04.000000000' '2019-06-09T09:02:23.000000000'
'2019-06-09T10:16:31.000000000' '2019-06-09T10:42:54.000000000'
'2019-06-11T07:07:17.000000000' '2019-06-11T08:32:37.000000000'
'2019-06-11T08:57:20.000000000' '2019-06-11T09:33:20.000000000'
'2019-06-11T10:25:40.000000000' '2019-06-11T11:42:54.000000000'
'2019-06-11T13:02:20.000000000' '2019-06-11T15:00:48.000000000'
'2019-06-11T16:06:08.000000000' '2019-06-11T16:16:49.000000000'
'2019-06-11T18:53:24.000000000' '2019-06-11T19:30:24.000000000'
'2019-06-11T20:41:43.000000000' '2019-06-11T21:53:17.000000000'
'2019-06-11T22:38:01.000000000' '2019-06-11T23:01:45.000000000'
'2019-06-11T23:47:56.000000000' '2019-06-12T00:10:33.000000000'
'2019-06-12T05:43:39.000000000' '2019-06-12T07:04:01.000000000'
'2019-06-12T16:03:03.000000000' '2019-06-12T17:24:27.000000000'
'2019-06-12T17:48:44.000000000' '2019-06-12T19:18:54.000000000'
'2019-06-12T19:34:53.000000000' '2019-06-12T20:58:31.000000000'
'2019-06-14T10:41:51.000000000' '2019-06-14T11:30:08.000000000'
'2019-06-14T12:08:21.000000000' '2019-06-14T12:37:53.000000000'
'2019-06-14T13:30:53.000000000' '2019-06-14T14:01:23.000000000'
'2019-06-14T15:28:13.000000000' '2019-06-14T20:24:48.000000000'
'2019-06-15T12:54:38.000000000' '2019-06-17T02:23:02.000000000'
'2019-06-17T03:07:37.000000000' '2019-06-17T04:07:05.000000000'
'2019-06-17T04:20:40.000000000' '2019-06-17T05:36:23.000000000'
'2019-06-17T06:50:27.000000000' '2019-06-17T07:19:09.000000000'
'2019-06-17T08:49:20.000000000' '2019-06-17T10:05:58.000000000'
'2019-06-17T10:26:50.000000000' '2019-06-17T11:19:04.000000000'
'2019-06-18T12:12:46.000000000' '2019-06-18T20:06:39.000000000'
'2019-06-20T11:17:48.000000000' '2019-06-21T10:33:10.000000000'
'2019-06-22T19:42:32.000000000' '2019-06-23T02:12:27.000000000'
'2019-06-23T03:24:26.000000000' '2019-06-23T10:13:01.000000000'
'2019-06-23T22:45:40.000000000' '2019-06-23T22:54:06.000000000'
'2019-06-24T00:35:20.000000000' '2019-06-24T01:24:22.000000000'
'2019-06-24T02:04:09.000000000' '2019-06-24T03:26:53.000000000'
'2019-06-24T04:13:08.000000000' '2019-06-24T05:37:36.000000000'
'2019-06-24T12:51:21.000000000' '2019-06-25T01:58:37.000000000'
'2019-06-25T13:49:21.000000000' '2019-06-25T15:24:29.000000000'
'2019-06-25T16:05:49.000000000' '2019-06-25T17:37:56.000000000'
'2019-06-25T18:13:24.000000000' '2019-06-25T18:49:04.000000000'
'2019-06-26T07:46:42.000000000' '2019-06-28T21:50:34.000000000'
'2019-06-29T00:14:17.000000000' '2019-06-29T01:13:14.000000000'
'2019-06-29T01:56:12.000000000' '2019-06-29T02:34:20.000000000'
'2019-06-29T03:05:28.000000000' '2019-06-29T03:43:16.000000000'
'2019-06-29T04:41:33.000000000' '2019-06-29T05:33:04.000000000'
'2019-06-29T08:30:56.000000000' '2019-06-30T06:47:04.000000000'
'2019-07-01T10:12:14.000000000' '2019-07-02T00:04:02.000000000'
'2019-07-02T00:45:12.000000000' '2019-07-02T01:20:12.000000000'
'2019-07-02T09:58:54.000000000' '2019-07-02T11:32:19.000000000'
'2019-07-02T12:37:09.000000000' '2019-07-02T13:08:19.000000000'
'2019-07-02T15:11:54.000000000' '2019-07-02T15:40:45.000000000'
'2019-07-02T16:12:58.000000000' '2019-07-02T17:24:21.000000000'
'2019-07-02T18:19:31.000000000' '2019-07-02T18:29:36.000000000'
'2019-07-02T19:50:12.000000000' '2019-07-02T20:05:38.000000000'
'2019-07-02T21:04:43.000000000' '2019-07-02T21:21:53.000000000'
'2019-07-02T22:14:21.000000000' '2019-07-02T22:44:02.000000000'
'2019-07-04T11:46:51.000000000' '2019-07-04T13:25:12.000000000'
'2019-07-04T14:00:31.000000000' '2019-07-04T15:13:33.000000000'
'2019-07-04T15:48:59.000000000' '2019-07-04T16:43:53.000000000'
'2019-07-04T17:36:09.000000000' '2019-07-04T19:12:16.000000000'
'2019-07-04T19:44:56.000000000' '2019-07-04T20:14:06.000000000'
'2019-07-04T20:55:52.000000000' '2019-07-05T14:39:01.000000000'
'2019-07-05T15:15:52.000000000' '2019-07-05T16:23:29.000000000'
'2019-07-05T16:46:22.000000000' '2019-07-05T21:44:54.000000000'
'2019-07-05T22:20:06.000000000' '2019-07-07T18:29:55.000000000'
'2019-07-07T19:40:36.000000000' '2019-07-07T20:55:36.000000000'
'2019-07-07T23:05:10.000000000' '2019-07-09T02:15:18.000000000'
'2019-07-09T04:54:37.000000000' '2019-07-10T08:08:55.000000000'
'2019-07-10T09:25:54.000000000' '2019-07-10T10:58:30.000000000'
'2019-07-11T13:48:19.000000000' '2019-07-12T03:28:59.000000000'
'2019-07-12T04:11:51.000000000' '2019-07-12T10:13:21.000000000'
'2019-07-12T13:48:37.000000000' '2019-07-12T22:54:35.000000000'
'2019-07-13T05:38:33.000000000' '2019-07-13T07:00:40.000000000'
'2019-07-18T09:21:22.000000000' '2019-07-19T07:38:09.000000000'
'2019-07-19T09:11:21.000000000' '2019-07-19T10:18:05.000000000'
'2019-07-19T13:16:52.000000000' '2019-07-19T14:23:48.000000000'
'2019-07-19T15:31:37.000000000' '2019-07-19T16:51:28.000000000'
'2019-07-19T18:00:24.000000000' '2019-07-19T19:00:09.000000000'
'2019-07-19T19:34:07.000000000' '2019-07-19T20:02:03.000000000'
'2019-07-19T21:03:34.000000000' '2019-07-19T22:20:34.000000000'
'2019-07-19T23:19:45.000000000' '2019-07-20T01:48:13.000000000'
'2019-07-20T04:22:08.000000000' '2019-07-20T06:16:56.000000000'
'2019-07-20T07:45:21.000000000' '2019-07-20T14:26:09.000000000'
'2019-07-20T15:32:47.000000000' '2019-07-20T20:12:23.000000000'
'2019-07-21T00:25:45.000000000' '2019-07-21T14:39:15.000000000'
'2019-07-21T23:56:15.000000000' '2019-07-22T05:15:42.000000000'
'2019-07-22T06:47:50.000000000' '2019-07-22T07:23:48.000000000'
'2019-07-22T07:56:47.000000000' '2019-07-22T17:27:34.000000000'
'2019-07-23T05:15:08.000000000' '2019-07-23T15:43:15.000000000'
'2019-07-24T07:08:45.000000000' '2019-07-24T07:57:08.000000000'
'2019-07-24T09:00:12.000000000' '2019-07-24T09:20:43.000000000'
'2019-07-24T10:19:10.000000000' '2019-07-24T10:34:50.000000000'
'2019-07-24T11:17:22.000000000' '2019-07-24T12:03:28.000000000'
'2019-07-24T12:51:24.000000000' '2019-07-24T13:57:14.000000000'
'2019-07-24T15:43:34.000000000' '2019-07-24T17:07:06.000000000'
'2019-07-24T17:55:25.000000000' '2019-07-24T18:47:47.000000000'
'2019-07-24T20:22:55.000000000' '2019-07-24T20:51:46.000000000'
'2019-07-24T21:37:54.000000000' '2019-07-24T22:03:04.000000000'
'2019-07-24T22:56:18.000000000' '2019-07-24T23:33:50.000000000'
'2019-07-24T23:47:11.000000000' '2019-07-25T01:01:21.000000000'
'2019-07-25T15:53:11.000000000' '2019-07-26T09:56:39.000000000'
'2019-07-26T19:27:20.000000000' '2019-07-26T20:35:54.000000000'
'2019-07-26T23:46:09.000000000' '2019-07-27T00:04:49.000000000'
'2019-07-27T00:34:20.000000000' '2019-07-27T01:04:11.000000000'
'2019-07-27T02:13:58.000000000' '2019-07-27T23:57:05.000000000'
'2019-07-28T01:40:36.000000000' '2019-07-28T02:30:28.000000000'
'2019-07-28T03:17:47.000000000' '2019-07-28T04:31:51.000000000'
'2019-07-28T05:41:19.000000000' '2019-07-28T06:11:49.000000000'
'2019-07-28T22:45:55.000000000' '2019-07-29T02:15:35.000000000'
'2019-07-29T02:41:51.000000000' '2019-07-29T03:44:00.000000000'
'2019-07-29T04:44:11.000000000' '2019-07-29T06:21:27.000000000'
'2019-07-30T05:23:52.000000000' '2019-07-30T14:02:12.000000000'
'2019-07-30T22:41:55.000000000' '2019-07-30T23:58:22.000000000'
'2019-07-31T00:16:44.000000000' '2019-07-31T01:13:17.000000000'
'2019-07-31T02:17:31.000000000' '2019-07-31T03:52:07.000000000'
'2019-07-31T04:03:30.000000000' '2019-07-31T05:47:04.000000000'
'2019-07-31T13:38:49.000000000' '2019-08-01T01:53:31.000000000'
'2019-08-01T02:33:22.000000000' '2019-08-01T03:51:46.000000000'
'2019-08-01T05:35:27.000000000' '2019-08-01T06:06:34.000000000'
'2019-08-01T06:55:32.000000000' '2019-08-01T07:47:18.000000000'
'2019-08-02T03:50:22.000000000' '2019-08-02T04:32:32.000000000'
'2019-08-02T05:04:32.000000000' '2019-08-02T19:41:40.000000000'
'2019-08-03T02:51:16.000000000' '2019-08-03T04:47:06.000000000'
'2019-08-03T05:40:35.000000000' '2019-08-04T15:11:10.000000000'
'2019-08-04T18:43:15.000000000' '2019-08-05T06:58:07.000000000'
'2019-08-05T07:40:00.000000000' '2019-08-05T08:50:29.000000000'
'2019-08-05T10:03:23.000000000' '2019-08-05T10:24:28.000000000'
'2019-08-05T11:01:20.000000000' '2019-08-05T11:31:20.000000000'
'2019-08-05T13:49:09.000000000' '2019-08-06T04:10:17.000000000'
'2019-08-06T05:19:43.000000000' '2019-08-06T07:07:25.000000000'
'2019-08-06T07:28:26.000000000' '2019-08-06T08:37:44.000000000'
'2019-08-06T09:00:32.000000000' '2019-08-06T10:15:15.000000000'
'2019-08-06T10:58:23.000000000' '2019-08-06T11:34:02.000000000'
'2019-08-06T12:10:13.000000000' '2019-08-06T12:31:18.000000000'
'2019-08-06T13:35:35.000000000' '2019-08-06T14:00:06.000000000'
'2019-08-06T15:30:27.000000000' '2019-08-07T06:31:05.000000000'
'2019-08-07T16:00:31.000000000' '2019-08-07T16:16:00.000000000'
'2019-08-07T17:04:21.000000000' '2019-08-07T17:40:33.000000000'
'2019-08-07T18:14:48.000000000' '2019-08-07T18:48:39.000000000'
'2019-08-07T19:14:08.000000000' '2019-08-07T21:00:46.000000000'
'2019-08-07T21:18:41.000000000' '2019-08-07T22:03:53.000000000'
'2019-08-07T22:17:06.000000000' '2019-08-07T23:10:23.000000000'
'2019-08-08T01:22:24.000000000' '2019-08-08T05:52:17.000000000'
'2019-08-08T13:16:14.000000000' '2019-08-08T20:16:52.000000000'
'2019-08-10T02:27:26.000000000' '2019-08-10T07:31:31.000000000'
'2019-08-10T22:08:00.000000000' '2019-08-10T22:51:49.000000000'
'2019-08-10T23:39:31.000000000' '2019-08-10T23:59:15.000000000'
'2019-08-11T00:41:26.000000000' '2019-08-11T01:38:37.000000000'
'2019-08-11T02:28:20.000000000' '2019-08-11T02:59:38.000000000'
'2019-08-11T04:07:19.000000000' '2019-08-11T04:43:49.000000000'
'2019-08-11T05:16:01.000000000' '2019-08-11T06:12:21.000000000'
'2019-08-11T06:50:51.000000000' '2019-08-11T07:31:05.000000000'
'2019-08-11T10:20:42.000000000' '2019-08-11T15:48:45.000000000'
'2019-08-11T16:43:42.000000000' '2019-08-11T17:02:09.000000000'
'2019-08-11T17:39:11.000000000' '2019-08-11T18:58:26.000000000'
'2019-08-11T22:12:17.000000000' '2019-08-11T22:51:04.000000000'
'2019-08-11T23:27:28.000000000' '2019-08-12T02:40:46.000000000'
'2019-08-12T05:38:09.000000000' '2019-08-12T06:39:31.000000000'
'2019-08-13T13:09:12.000000000' '2019-08-13T14:17:18.000000000'
'2019-08-13T17:24:18.000000000' '2019-08-13T21:52:25.000000000'
'2019-08-14T03:53:23.000000000' '2019-08-14T05:15:12.000000000'
'2019-08-14T06:51:12.000000000' '2019-08-14T07:42:28.000000000'
'2019-08-14T08:12:45.000000000' '2019-08-14T09:12:31.000000000'
'2019-08-14T09:33:34.000000000' '2019-08-14T14:16:32.000000000'
'2019-08-14T15:22:19.000000000' '2019-08-14T18:16:49.000000000'
'2019-08-15T03:36:05.000000000' '2019-08-15T04:51:27.000000000'
'2019-08-15T12:02:44.000000000' '2019-08-15T13:16:20.000000000'
'2019-08-15T14:32:08.000000000' '2019-08-15T15:26:15.000000000'
'2019-08-15T15:59:52.000000000' '2019-08-15T16:41:41.000000000'
'2019-08-15T17:31:28.000000000' '2019-08-16T10:24:10.000000000'
'2019-08-16T11:43:30.000000000' '2019-08-16T13:43:17.000000000'
'2019-08-16T15:19:26.000000000' '2019-08-16T16:24:37.000000000'
'2019-08-16T19:28:01.000000000' '2019-08-16T19:57:13.000000000'
'2019-08-16T20:52:00.000000000' '2019-08-16T21:20:37.000000000'
'2019-08-16T21:50:44.000000000' '2019-08-16T22:18:17.000000000'
'2019-08-16T22:52:28.000000000' '2019-08-16T23:24:23.000000000'
'2019-08-17T00:20:00.000000000' '2019-08-17T01:14:04.000000000'
'2019-08-17T01:55:02.000000000' '2019-08-17T02:57:31.000000000'
'2019-08-17T03:59:36.000000000' '2019-08-17T07:03:19.000000000'
'2019-08-18T08:53:23.000000000' '2019-08-20T01:37:18.000000000'
'2019-08-20T03:13:17.000000000' '2019-08-20T04:24:27.000000000'
'2019-08-20T05:17:54.000000000' '2019-08-20T06:23:45.000000000'
'2019-08-20T07:28:12.000000000' '2019-08-21T00:55:41.000000000'
'2019-08-21T02:15:29.000000000' '2019-08-21T04:34:37.000000000'
'2019-08-21T06:17:41.000000000' '2019-08-21T07:13:57.000000000'
'2019-08-21T08:25:16.000000000' '2019-08-21T17:12:52.000000000'
'2019-08-22T01:25:48.000000000' '2019-08-22T18:53:21.000000000'
'2019-08-22T19:54:53.000000000' '2019-08-22T21:36:46.000000000'
'2019-08-23T09:22:26.000000000' '2019-08-23T09:48:02.000000000'
'2019-08-23T11:02:27.000000000' '2019-08-24T00:30:35.000000000'
'2019-08-24T01:29:14.000000000' '2019-08-24T02:36:34.000000000'
'2019-08-24T03:26:11.000000000' '2019-08-24T04:12:34.000000000'
'2019-08-24T05:05:23.000000000' '2019-08-24T07:17:34.000000000'
'2019-08-24T08:04:18.000000000' '2019-08-24T15:33:48.000000000'
'2019-08-24T17:19:07.000000000' '2019-08-24T18:26:40.000000000'
'2019-08-24T23:25:49.000000000' '2019-08-25T01:13:12.000000000'
'2019-08-25T02:48:51.000000000' '2019-08-26T05:43:24.000000000'
'2019-08-26T15:36:04.000000000' '2019-08-26T22:24:04.000000000'
'2019-08-30T09:55:11.000000000' '2019-08-30T12:00:51.000000000'
'2019-08-30T21:42:59.000000000' '2019-08-30T22:05:38.000000000'
'2019-08-30T23:54:01.000000000' '2019-08-31T00:14:30.000000000'
'2019-08-31T01:24:06.000000000' '2019-08-31T16:08:47.000000000'
'2019-08-31T16:28:34.000000000' '2019-08-31T17:10:54.000000000'
'2019-09-02T06:33:55.000000000' '2019-09-02T07:28:42.000000000'
'2019-09-02T08:23:04.000000000' '2019-09-02T14:08:54.000000000'
'2019-09-02T15:03:24.000000000' '2019-09-02T16:09:14.000000000'
'2019-09-02T16:40:49.000000000' '2019-09-02T17:20:04.000000000'
'2019-09-02T17:53:45.000000000' '2019-09-04T03:28:13.000000000'
'2019-09-06T01:11:01.000000000' '2019-09-06T02:22:05.000000000'
'2019-09-06T04:00:31.000000000' '2019-09-06T05:07:45.000000000'
'2019-09-06T15:06:00.000000000' '2019-09-06T16:24:28.000000000'
'NaT']
Количество уникальных значений: 577
****************************************************************************************************
Уникальные значения столбца 'Bulk 7':
['2019-05-07T18:11:01.000000000' '2019-05-15T22:42:51.000000000'
'2019-06-17T02:23:02.000000000' '2019-06-23T21:47:17.000000000'
'2019-07-20T16:22:44.000000000' '2019-07-20T19:54:13.000000000'
'2019-07-20T23:48:47.000000000' '2019-07-27T03:55:56.000000000'
'2019-07-27T05:17:30.000000000' '2019-07-27T06:40:27.000000000'
'2019-07-27T08:58:52.000000000' '2019-07-27T15:19:35.000000000'
'2019-07-27T18:07:02.000000000' '2019-07-27T19:15:43.000000000'
'2019-08-12T20:24:56.000000000' '2019-08-12T22:25:23.000000000'
'2019-08-12T23:46:35.000000000' '2019-08-13T01:36:24.000000000'
'2019-08-13T04:19:43.000000000' '2019-08-13T07:20:46.000000000'
'2019-08-13T09:39:09.000000000' '2019-08-13T11:01:00.000000000'
'2019-09-05T16:39:32.000000000' '2019-09-05T18:40:50.000000000'
'2019-09-05T19:07:49.000000000' 'NaT']
Количество уникальных значений: 26
****************************************************************************************************
Уникальные значения столбца 'Bulk 8':
['2019-07-08T17:14:53.000000000' 'NaT']
Количество уникальных значений: 2
****************************************************************************************************
Уникальные значения столбца 'Bulk 9':
['2019-05-14T11:57:58.000000000' '2019-05-14T12:35:40.000000000'
'2019-05-14T13:02:53.000000000' '2019-05-14T13:47:34.000000000'
'2019-05-14T14:10:18.000000000' '2019-05-14T14:51:58.000000000'
'2019-05-14T15:30:12.000000000' '2019-05-14T16:00:13.000000000'
'2019-05-14T16:39:17.000000000' '2019-05-14T16:55:09.000000000'
'2019-07-08T17:23:24.000000000' '2019-07-22T08:33:41.000000000'
'2019-07-22T10:46:08.000000000' '2019-08-16T05:19:14.000000000'
'2019-08-16T06:07:48.000000000' '2019-08-16T07:20:13.000000000'
'2019-08-16T07:52:37.000000000' '2019-08-16T08:40:51.000000000'
'2019-08-16T09:11:56.000000000' 'NaT']
Количество уникальных значений: 20
****************************************************************************************************
Уникальные значения столбца 'Bulk 10':
['2019-05-06T07:54:02.000000000' '2019-05-07T01:15:07.000000000'
'2019-05-07T01:56:53.000000000' '2019-05-07T03:10:27.000000000'
'2019-05-07T03:32:28.000000000' '2019-05-07T04:36:23.000000000'
'2019-05-07T05:50:53.000000000' '2019-05-07T06:50:00.000000000'
'2019-05-14T19:21:25.000000000' '2019-05-14T20:05:17.000000000'
'2019-05-16T01:56:47.000000000' '2019-05-17T21:07:18.000000000'
'2019-05-17T21:28:49.000000000' '2019-05-17T22:17:26.000000000'
'2019-05-17T23:01:27.000000000' '2019-05-17T23:47:29.000000000'
'2019-05-18T11:18:06.000000000' '2019-05-21T01:58:55.000000000'
'2019-05-21T03:31:46.000000000' '2019-05-21T04:27:19.000000000'
'2019-05-21T05:56:02.000000000' '2019-05-21T07:11:02.000000000'
'2019-05-21T08:19:09.000000000' '2019-05-21T09:07:35.000000000'
'2019-05-21T09:58:45.000000000' '2019-05-21T21:36:58.000000000'
'2019-05-21T22:37:19.000000000' '2019-05-22T08:27:23.000000000'
'2019-05-22T09:24:11.000000000' '2019-05-22T10:18:51.000000000'
'2019-05-22T12:14:58.000000000' '2019-05-22T12:46:08.000000000'
'2019-05-24T18:28:15.000000000' '2019-05-24T19:12:02.000000000'
'2019-05-24T19:38:09.000000000' '2019-05-24T20:30:00.000000000'
'2019-05-26T12:19:31.000000000' '2019-05-26T13:04:03.000000000'
'2019-05-28T20:36:47.000000000' '2019-05-30T00:35:25.000000000'
'2019-05-31T19:29:36.000000000' '2019-06-01T00:59:53.000000000'
'2019-06-01T10:09:47.000000000' '2019-06-02T03:06:59.000000000'
'2019-06-02T05:53:41.000000000' '2019-06-02T06:26:35.000000000'
'2019-06-02T07:27:58.000000000' '2019-06-02T08:24:42.000000000'
'2019-06-02T11:20:31.000000000' '2019-06-02T14:51:45.000000000'
'2019-06-06T16:01:44.000000000' '2019-06-10T20:08:17.000000000'
'2019-06-10T20:43:32.000000000' '2019-06-10T22:58:03.000000000'
'2019-06-13T02:15:11.000000000' '2019-06-13T04:15:09.000000000'
'2019-06-13T05:22:48.000000000' '2019-06-20T05:41:49.000000000'
'2019-06-20T06:37:50.000000000' '2019-06-20T18:05:33.000000000'
'2019-06-20T20:54:25.000000000' '2019-06-20T21:59:35.000000000'
'2019-06-23T12:48:51.000000000' '2019-06-23T14:22:38.000000000'
'2019-06-25T19:42:29.000000000' '2019-06-25T20:06:18.000000000'
'2019-06-25T21:41:50.000000000' '2019-06-25T22:14:55.000000000'
'2019-06-27T21:50:53.000000000' '2019-06-27T22:21:02.000000000'
'2019-06-27T23:04:38.000000000' '2019-06-27T23:28:03.000000000'
'2019-06-28T00:30:14.000000000' '2019-06-28T15:13:56.000000000'
'2019-06-28T16:19:20.000000000' '2019-06-28T17:19:09.000000000'
'2019-06-28T17:47:14.000000000' '2019-06-28T18:23:07.000000000'
'2019-06-28T19:01:44.000000000' '2019-06-30T21:21:30.000000000'
'2019-07-01T01:12:27.000000000' '2019-07-01T02:45:36.000000000'
'2019-07-01T05:20:39.000000000' '2019-07-04T12:28:56.000000000'
'2019-07-06T04:50:19.000000000' '2019-07-06T05:31:18.000000000'
'2019-07-06T06:27:50.000000000' '2019-07-06T06:41:52.000000000'
'2019-07-06T07:28:48.000000000' '2019-07-06T09:23:17.000000000'
'2019-07-06T09:57:49.000000000' '2019-07-06T10:39:52.000000000'
'2019-07-09T05:29:31.000000000' '2019-07-09T06:19:17.000000000'
'2019-07-09T06:52:55.000000000' '2019-07-09T20:46:56.000000000'
'2019-07-09T21:37:00.000000000' '2019-07-12T21:47:41.000000000'
'2019-07-18T19:49:54.000000000' '2019-07-18T21:09:10.000000000'
'2019-07-21T21:40:41.000000000' '2019-07-21T22:29:14.000000000'
'2019-07-24T06:19:03.000000000' '2019-07-28T07:49:06.000000000'
'2019-07-28T08:32:44.000000000' '2019-07-28T08:49:53.000000000'
'2019-07-28T09:44:48.000000000' '2019-07-28T10:36:29.000000000'
'2019-07-28T11:06:33.000000000' '2019-07-28T11:56:38.000000000'
'2019-07-28T13:24:37.000000000' '2019-07-28T14:06:25.000000000'
'2019-07-29T08:36:32.000000000' '2019-07-29T10:24:32.000000000'
'2019-07-29T11:14:34.000000000' '2019-07-29T11:34:33.000000000'
'2019-07-29T12:15:00.000000000' '2019-07-29T12:47:14.000000000'
'2019-07-30T10:48:38.000000000' '2019-07-30T13:36:57.000000000'
'2019-07-31T11:10:59.000000000' '2019-07-31T12:22:40.000000000'
'2019-07-31T12:44:11.000000000' '2019-07-31T14:56:10.000000000'
'2019-08-01T20:17:21.000000000' '2019-08-01T20:35:37.000000000'
'2019-08-01T23:49:43.000000000' '2019-08-02T00:25:09.000000000'
'2019-08-02T02:48:08.000000000' '2019-08-02T03:17:23.000000000'
'2019-08-03T22:18:02.000000000' '2019-08-04T02:06:48.000000000'
'2019-08-04T03:13:59.000000000' '2019-08-04T05:57:13.000000000'
'2019-08-04T16:34:19.000000000' '2019-08-04T19:55:58.000000000'
'2019-08-04T21:00:53.000000000' '2019-08-04T22:48:58.000000000'
'2019-08-04T23:12:11.000000000' '2019-08-07T04:55:13.000000000'
'2019-08-10T11:31:40.000000000' '2019-08-10T12:17:52.000000000'
'2019-08-10T13:21:18.000000000' '2019-08-14T23:20:47.000000000'
'2019-08-15T00:05:05.000000000' '2019-08-15T01:09:17.000000000'
'2019-08-15T01:39:57.000000000' '2019-08-15T20:10:01.000000000'
'2019-08-20T17:15:15.000000000' '2019-08-20T19:51:47.000000000'
'2019-08-20T20:29:52.000000000' '2019-08-20T21:07:11.000000000'
'2019-08-20T22:02:14.000000000' '2019-08-20T22:25:23.000000000'
'2019-08-26T07:24:11.000000000' '2019-08-27T02:07:02.000000000'
'2019-08-28T15:35:38.000000000' '2019-08-28T15:53:05.000000000'
'2019-08-28T16:48:18.000000000' '2019-08-28T18:03:47.000000000'
'2019-08-29T07:01:20.000000000' '2019-08-29T08:35:38.000000000'
'2019-08-29T13:18:33.000000000' '2019-08-29T14:04:23.000000000'
'2019-08-29T14:17:57.000000000' '2019-08-31T04:56:48.000000000'
'2019-08-31T05:55:31.000000000' '2019-08-31T06:37:35.000000000'
'2019-08-31T07:28:39.000000000' '2019-09-02T03:21:10.000000000'
'2019-09-02T03:32:49.000000000' '2019-09-02T04:45:41.000000000'
'2019-09-03T05:40:49.000000000' '2019-09-03T05:57:25.000000000'
'2019-09-04T09:44:31.000000000' '2019-09-04T10:03:22.000000000'
'NaT']
Количество уникальных значений: 177
****************************************************************************************************
Уникальные значения столбца 'Bulk 11':
['2019-05-05T23:43:24.000000000' '2019-05-06T16:18:00.000000000'
'2019-05-07T00:01:30.000000000' '2019-05-07T09:53:19.000000000'
'2019-05-07T18:04:04.000000000' '2019-05-08T15:39:24.000000000'
'2019-05-09T08:58:16.000000000' '2019-05-09T14:23:12.000000000'
'2019-05-10T07:53:26.000000000' '2019-05-10T14:46:18.000000000'
'2019-05-10T18:36:30.000000000' '2019-05-10T21:21:48.000000000'
'2019-05-17T06:42:42.000000000' '2019-05-17T11:14:08.000000000'
'2019-05-18T23:16:30.000000000' '2019-05-18T23:55:19.000000000'
'2019-05-21T17:57:26.000000000' '2019-05-23T03:31:40.000000000'
'2019-05-23T11:26:07.000000000' '2019-05-24T09:36:26.000000000'
'2019-05-25T06:37:43.000000000' '2019-05-25T13:54:57.000000000'
'2019-05-27T13:16:13.000000000' '2019-06-04T02:54:36.000000000'
'2019-06-04T04:21:50.000000000' '2019-06-05T12:18:07.000000000'
'2019-06-05T23:27:56.000000000' '2019-06-06T11:43:48.000000000'
'2019-06-07T08:42:42.000000000' '2019-06-07T15:41:43.000000000'
'2019-06-08T00:04:16.000000000' '2019-06-11T07:22:27.000000000'
'2019-06-11T19:15:50.000000000' '2019-06-11T23:01:45.000000000'
'2019-06-12T01:27:39.000000000' '2019-06-12T02:31:44.000000000'
'2019-06-12T05:40:52.000000000' '2019-06-12T17:19:45.000000000'
'2019-06-12T18:03:07.000000000' '2019-06-12T19:34:53.000000000'
'2019-06-12T21:30:38.000000000' '2019-06-13T14:44:39.000000000'
'2019-06-14T02:25:19.000000000' '2019-06-14T13:30:53.000000000'
'2019-06-14T20:24:48.000000000' '2019-06-15T12:48:22.000000000'
'2019-06-15T17:46:40.000000000' '2019-06-17T09:00:16.000000000'
'2019-06-17T10:42:37.000000000' '2019-06-22T00:44:00.000000000'
'2019-06-25T15:04:33.000000000' '2019-06-25T18:13:24.000000000'
'2019-06-27T10:50:36.000000000' '2019-06-29T08:23:23.000000000'
'2019-06-29T14:55:02.000000000' '2019-07-02T01:14:00.000000000'
'2019-07-04T11:37:11.000000000' '2019-07-05T16:23:29.000000000'
'2019-07-05T16:46:22.000000000' '2019-07-07T07:56:08.000000000'
'2019-07-09T04:40:52.000000000' '2019-07-10T04:55:12.000000000'
'2019-07-11T05:54:32.000000000' '2019-07-11T09:25:45.000000000'
'2019-07-12T03:15:38.000000000' '2019-07-12T09:06:13.000000000'
'2019-07-12T13:43:25.000000000' '2019-07-13T05:33:18.000000000'
'2019-07-19T07:43:27.000000000' '2019-07-19T16:46:45.000000000'
'2019-07-19T18:55:00.000000000' '2019-07-19T20:59:43.000000000'
'2019-07-19T23:15:16.000000000' '2019-07-20T01:48:13.000000000'
'2019-07-20T04:22:08.000000000' '2019-07-20T09:56:49.000000000'
'2019-07-20T15:29:10.000000000' '2019-07-21T00:22:33.000000000'
'2019-07-22T00:46:22.000000000' '2019-07-22T01:40:34.000000000'
'2019-07-22T03:16:46.000000000' '2019-07-22T04:01:49.000000000'
'2019-07-22T17:27:34.000000000' '2019-07-23T05:11:26.000000000'
'2019-07-24T07:33:14.000000000' '2019-07-24T07:57:08.000000000'
'2019-07-24T09:27:39.000000000' '2019-07-24T10:34:50.000000000'
'2019-07-24T11:17:22.000000000' '2019-07-24T11:57:35.000000000'
'2019-07-24T13:00:50.000000000' '2019-07-24T13:50:39.000000000'
'2019-07-24T15:43:34.000000000' '2019-07-24T21:07:51.000000000'
'2019-07-24T23:08:34.000000000' '2019-07-24T23:47:11.000000000'
'2019-07-26T00:20:40.000000000' '2019-07-26T20:00:10.000000000'
'2019-07-27T00:04:49.000000000' '2019-07-28T03:17:47.000000000'
'2019-07-28T06:11:49.000000000' '2019-07-29T03:10:59.000000000'
'2019-07-29T22:45:53.000000000' '2019-07-30T14:02:12.000000000'
'2019-07-30T23:01:15.000000000' '2019-07-31T03:48:25.000000000'
'2019-07-31T04:30:00.000000000' '2019-07-31T05:07:02.000000000'
'2019-07-31T05:40:37.000000000' '2019-07-31T13:38:49.000000000'
'2019-08-01T01:51:27.000000000' '2019-08-02T04:32:32.000000000'
'2019-08-03T03:12:11.000000000' '2019-08-03T11:37:16.000000000'
'2019-08-03T11:48:52.000000000' '2019-08-03T13:07:21.000000000'
'2019-08-04T18:37:48.000000000' '2019-08-05T10:10:45.000000000'
'2019-08-05T11:31:20.000000000' '2019-08-07T17:36:24.000000000'
'2019-08-07T18:45:21.000000000' '2019-08-07T22:49:01.000000000'
'2019-08-07T23:41:22.000000000' '2019-08-08T01:22:24.000000000'
'2019-08-08T13:09:34.000000000' '2019-08-08T21:40:01.000000000'
'2019-08-09T02:24:16.000000000' '2019-08-09T16:12:37.000000000'
'2019-08-10T02:22:53.000000000' '2019-08-10T07:31:31.000000000'
'2019-08-11T03:14:52.000000000' '2019-08-11T04:38:50.000000000'
'2019-08-11T05:21:01.000000000' '2019-08-11T06:04:28.000000000'
'2019-08-11T16:27:29.000000000' '2019-08-11T22:46:02.000000000'
'2019-08-12T03:20:19.000000000' '2019-08-12T05:35:47.000000000'
'2019-08-12T06:52:27.000000000' '2019-08-12T10:18:47.000000000'
'2019-08-13T14:10:57.000000000' '2019-08-13T17:18:52.000000000'
'2019-08-13T19:39:29.000000000' '2019-08-13T20:51:12.000000000'
'2019-08-13T21:26:01.000000000' '2019-08-13T22:09:13.000000000'
'2019-08-14T08:27:26.000000000' '2019-08-14T14:10:22.000000000'
'2019-08-14T17:53:38.000000000' '2019-08-15T03:24:52.000000000'
'2019-08-15T04:39:27.000000000' '2019-08-16T11:49:05.000000000'
'2019-08-16T13:36:53.000000000' '2019-08-16T22:25:25.000000000'
'2019-08-16T23:24:23.000000000' '2019-08-17T01:16:25.000000000'
'2019-08-17T01:45:04.000000000' '2019-08-17T03:49:17.000000000'
'2019-08-17T07:03:19.000000000' '2019-08-18T05:07:43.000000000'
'2019-08-21T04:28:31.000000000' '2019-08-21T06:02:28.000000000'
'2019-08-21T07:13:57.000000000' '2019-08-22T00:59:25.000000000'
'2019-08-22T19:36:51.000000000' '2019-08-22T20:11:37.000000000'
'2019-08-24T06:42:47.000000000' '2019-08-24T07:13:12.000000000'
'2019-08-24T08:04:18.000000000' '2019-08-24T17:19:07.000000000'
'2019-08-30T23:12:49.000000000' '2019-08-31T00:22:35.000000000'
'2019-08-31T01:24:06.000000000' '2019-09-06T01:04:04.000000000'
'2019-09-06T02:22:05.000000000' '2019-09-06T03:54:36.000000000'
'2019-09-06T05:03:14.000000000' 'NaT']
Количество уникальных значений: 178
****************************************************************************************************
Уникальные значения столбца 'Bulk 12':
['2019-05-03T11:24:31.000000000' '2019-05-03T11:53:30.000000000'
'2019-05-03T12:27:13.000000000' ... '2019-09-06T12:18:35.000000000'
'2019-09-06T15:01:44.000000000' 'NaT']
Количество уникальных значений: 2451
****************************************************************************************************
Уникальные значения столбца 'Bulk 13':
['2019-05-05T02:10:21.000000000' '2019-05-11T03:35:36.000000000'
'2019-05-11T08:18:48.000000000' '2019-05-16T05:29:33.000000000'
'2019-05-28T08:15:29.000000000' '2019-06-01T18:07:38.000000000'
'2019-06-09T08:18:22.000000000' '2019-06-21T20:10:19.000000000'
'2019-06-30T14:38:12.000000000' '2019-07-02T11:21:01.000000000'
'2019-07-05T14:20:41.000000000' '2019-07-11T03:10:43.000000000'
'2019-07-29T02:58:13.000000000' '2019-08-07T06:23:16.000000000'
'2019-08-22T14:48:22.000000000' '2019-08-26T07:00:42.000000000'
'2019-08-29T14:49:47.000000000' '2019-09-01T01:53:02.000000000'
'NaT']
Количество уникальных значений: 19
****************************************************************************************************
Уникальные значения столбца 'Bulk 14':
['2019-05-03T11:14:50.000000000' '2019-05-03T11:48:37.000000000'
'2019-05-03T12:21:01.000000000' ... '2019-09-06T16:07:29.000000000'
'2019-09-06T17:26:33.000000000' 'NaT']
Количество уникальных значений: 2807
****************************************************************************************************
Уникальные значения столбца 'Bulk 15':
['2019-05-03T11:10:43.000000000' '2019-05-03T11:44:39.000000000'
'2019-05-03T12:16:16.000000000' ... '2019-09-06T16:01:34.000000000'
'2019-09-06T17:23:15.000000000' 'NaT']
Количество уникальных значений: 2249
****************************************************************************************************
df_bulk_time
| Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||
| 1 | NaT | NaT | NaT | 2019-05-03 11:28:48 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:24:31 | NaT | 2019-05-03 11:14:50 | 2019-05-03 11:10:43 |
| 2 | NaT | NaT | NaT | 2019-05-03 11:36:50 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:53:30 | NaT | 2019-05-03 11:48:37 | 2019-05-03 11:44:39 |
| 3 | NaT | NaT | NaT | 2019-05-03 12:32:39 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:27:13 | NaT | 2019-05-03 12:21:01 | 2019-05-03 12:16:16 |
| 4 | NaT | NaT | NaT | 2019-05-03 12:43:22 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:58:00 | NaT | 2019-05-03 12:51:11 | 2019-05-03 12:46:36 |
| 5 | NaT | NaT | NaT | 2019-05-03 13:30:47 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 13:30:47 | NaT | 2019-05-03 13:34:12 | 2019-05-03 13:30:47 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | NaT | NaT | 2019-09-06 11:54:15 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 11:49:45 | NaT | 2019-09-06 11:45:22 | 2019-09-06 11:40:06 |
| 3238 | NaT | NaT | 2019-09-06 12:26:52 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 12:18:35 | NaT | 2019-09-06 12:31:49 | 2019-09-06 12:26:52 |
| 3239 | NaT | NaT | NaT | NaT | NaT | 2019-09-06 15:06:00 | NaT | NaT | NaT | NaT | NaT | 2019-09-06 15:01:44 | NaT | 2019-09-06 14:58:15 | 2019-09-06 14:48:06 |
| 3240 | NaT | NaT | NaT | NaT | NaT | 2019-09-06 16:24:28 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 16:07:29 | 2019-09-06 16:01:34 |
| 3241 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 17:26:33 | 2019-09-06 17:23:15 |
3129 rows × 15 columns
В таком виде нам эти данные мало о чем говорят.
Попробуем расчитать длительность процесса подачи сыпучих материалов.
df_bulk_time['start_time_bulk'] = df_bulk_time.min(axis=1)
df_bulk_time['finish_time_bulk'] = df_bulk_time.max(axis=1)
df_bulk_time['duration_bulk'] = (
df_bulk_time['finish_time_bulk'] - df_bulk_time['start_time_bulk']).astype('timedelta64[s]')
df_bulk_time
| Bulk 1 | Bulk 2 | Bulk 3 | Bulk 4 | Bulk 5 | Bulk 6 | Bulk 7 | Bulk 8 | Bulk 9 | Bulk 10 | Bulk 11 | Bulk 12 | Bulk 13 | Bulk 14 | Bulk 15 | start_time_bulk | finish_time_bulk | duration_bulk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | ||||||||||||||||||
| 1 | NaT | NaT | NaT | 2019-05-03 11:28:48 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:24:31 | NaT | 2019-05-03 11:14:50 | 2019-05-03 11:10:43 | 2019-05-03 11:10:43 | 2019-05-03 11:28:48 | 1085.0 |
| 2 | NaT | NaT | NaT | 2019-05-03 11:36:50 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:53:30 | NaT | 2019-05-03 11:48:37 | 2019-05-03 11:44:39 | 2019-05-03 11:36:50 | 2019-05-03 11:53:30 | 1000.0 |
| 3 | NaT | NaT | NaT | 2019-05-03 12:32:39 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:27:13 | NaT | 2019-05-03 12:21:01 | 2019-05-03 12:16:16 | 2019-05-03 12:16:16 | 2019-05-03 12:32:39 | 983.0 |
| 4 | NaT | NaT | NaT | 2019-05-03 12:43:22 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:58:00 | NaT | 2019-05-03 12:51:11 | 2019-05-03 12:46:36 | 2019-05-03 12:43:22 | 2019-05-03 12:58:00 | 878.0 |
| 5 | NaT | NaT | NaT | 2019-05-03 13:30:47 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 13:30:47 | NaT | 2019-05-03 13:34:12 | 2019-05-03 13:30:47 | 2019-05-03 13:30:47 | 2019-05-03 13:34:12 | 205.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | NaT | NaT | 2019-09-06 11:54:15 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 11:49:45 | NaT | 2019-09-06 11:45:22 | 2019-09-06 11:40:06 | 2019-09-06 11:40:06 | 2019-09-06 11:54:15 | 849.0 |
| 3238 | NaT | NaT | 2019-09-06 12:26:52 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 12:18:35 | NaT | 2019-09-06 12:31:49 | 2019-09-06 12:26:52 | 2019-09-06 12:18:35 | 2019-09-06 12:31:49 | 794.0 |
| 3239 | NaT | NaT | NaT | NaT | NaT | 2019-09-06 15:06:00 | NaT | NaT | NaT | NaT | NaT | 2019-09-06 15:01:44 | NaT | 2019-09-06 14:58:15 | 2019-09-06 14:48:06 | 2019-09-06 14:48:06 | 2019-09-06 15:06:00 | 1074.0 |
| 3240 | NaT | NaT | NaT | NaT | NaT | 2019-09-06 16:24:28 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 16:07:29 | 2019-09-06 16:01:34 | 2019-09-06 16:01:34 | 2019-09-06 16:24:28 | 1374.0 |
| 3241 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 17:26:33 | 2019-09-06 17:23:15 | 2019-09-06 17:23:15 | 2019-09-06 17:26:33 | 198.0 |
3129 rows × 18 columns
Удалим Bulk 1 - Bulk 15. Получится табличка похожая на df_arc.
df_bulk_time = df_bulk_time[['start_time_bulk', 'finish_time_bulk','duration_bulk']]
df_bulk_time
| start_time_bulk | finish_time_bulk | duration_bulk | |
|---|---|---|---|
| key | |||
| 1 | 2019-05-03 11:10:43 | 2019-05-03 11:28:48 | 1085.0 |
| 2 | 2019-05-03 11:36:50 | 2019-05-03 11:53:30 | 1000.0 |
| 3 | 2019-05-03 12:16:16 | 2019-05-03 12:32:39 | 983.0 |
| 4 | 2019-05-03 12:43:22 | 2019-05-03 12:58:00 | 878.0 |
| 5 | 2019-05-03 13:30:47 | 2019-05-03 13:34:12 | 205.0 |
| ... | ... | ... | ... |
| 3237 | 2019-09-06 11:40:06 | 2019-09-06 11:54:15 | 849.0 |
| 3238 | 2019-09-06 12:18:35 | 2019-09-06 12:31:49 | 794.0 |
| 3239 | 2019-09-06 14:48:06 | 2019-09-06 15:06:00 | 1074.0 |
| 3240 | 2019-09-06 16:01:34 | 2019-09-06 16:24:28 | 1374.0 |
| 3241 | 2019-09-06 17:23:15 | 2019-09-06 17:26:33 | 198.0 |
3129 rows × 3 columns
# Сводная статистика для всех столбцов в `df_bulk_time`
df_bulk_time.describe(include='all', datetime_is_numeric=True)
| start_time_bulk | finish_time_bulk | duration_bulk | |
|---|---|---|---|
| count | 3129 | 3129 | 3129.000000 |
| mean | 2019-07-05 20:31:19.313518336 | 2019-07-05 20:47:24.220198144 | 964.906679 |
| min | 2019-05-03 11:10:43 | 2019-05-03 11:28:48 | 0.000000 |
| 25% | 2019-06-04 09:12:47 | 2019-06-04 09:22:40 | 485.000000 |
| 50% | 2019-07-03 04:58:44 | 2019-07-03 05:17:25 | 877.000000 |
| 75% | 2019-08-08 00:03:27 | 2019-08-08 00:23:14 | 1311.000000 |
| max | 2019-09-06 17:23:15 | 2019-09-06 17:26:33 | 13683.000000 |
| std | NaN | NaN | 798.088025 |
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
# Гистограмма для 'duration_bulk'
sns.histplot(data=df_bulk_time['duration_bulk'],
alpha=0.8,
label='Длительность обработки с применением сыпучих материалов(с)')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота',
fontsize=12,
color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'duration_bulk'
sns.boxplot(data=df_bulk_time,
x='duration_bulk',
orient='horizontal')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('Длительность обработки с применением сыпучих материалов',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение длительность обработки с применением сыпучих материалов (с)',
fontsize=15,
color='DarkSlateGray')
plt.show()
# Посмотрим уникальные значения по колонкам `df_bulk_time`
get_unique_sorted(df_bulk_time)
Уникальные значения столбца 'start_time_bulk': ['2019-05-03T11:10:43.000000000' '2019-05-03T11:36:50.000000000' '2019-05-03T12:16:16.000000000' ... '2019-09-06T14:48:06.000000000' '2019-09-06T16:01:34.000000000' '2019-09-06T17:23:15.000000000'] Количество уникальных значений: 3129 **************************************************************************************************** Уникальные значения столбца 'finish_time_bulk': ['2019-05-03T11:28:48.000000000' '2019-05-03T11:53:30.000000000' '2019-05-03T12:32:39.000000000' ... '2019-09-06T15:06:00.000000000' '2019-09-06T16:24:28.000000000' '2019-09-06T17:26:33.000000000'] Количество уникальных значений: 3129 **************************************************************************************************** Уникальные значения столбца 'duration_bulk': [ 0. 82. 123. ... 9336. 11552. 13683.] Количество уникальных значений: 1528 ****************************************************************************************************
А теперь без учета нулевых значений.
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
sns.histplot(data=df_bulk_time[df_bulk_time['duration_bulk'] != 0]['duration_bulk'],
alpha=0.8,
label='Длительность обработки с применением сыпучих материалов (с)')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота',
fontsize=12,
color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'duration_bulk'
sns.boxplot(data=df_bulk_time[df_bulk_time['duration_bulk'] != 0],
x='duration_bulk',
orient='horizontal')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('Длительность обработки с применением сыпучих материалов (с)',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение длительности обработки с применением сыпучих материалов (с)',
fontsize=15,
color='DarkSlateGray')
plt.show()
Аномальных значений не обнаружено.
Время добавления сыпучих материалов находится в тех же промежутках, что и ранее рассмотренные данные по обработке стали. Максимальная длительность обработки одной партии составляет 13683 с (около 4 часов).
df_gas — данные о продувке сплава газом¶# Проверим столбец `key` на уникальность значений
df_gas['key'].duplicated().sum()
0
# Столбец `key` установим в качестве индекса датафрейма
df_gas = df_gas.set_index('key')
# Посмотрим уникальные значения по колонкам `df_gas`
get_unique_sorted(df_gas)
Уникальные значения столбца 'Газ 1': [8.39852910e-03 1.66956024e-02 2.63028954e-01 ... 5.21423726e+01 6.09356892e+01 7.79950397e+01] Количество уникальных значений: 3239 ****************************************************************************************************
df_gas
| Газ 1 | |
|---|---|
| key | |
| 1 | 29.749986 |
| 2 | 12.555561 |
| 3 | 28.554793 |
| 4 | 18.841219 |
| 5 | 5.413692 |
| ... | ... |
| 3237 | 5.543905 |
| 3238 | 6.745669 |
| 3239 | 16.023518 |
| 3240 | 11.863103 |
| 3241 | 12.680959 |
3239 rows × 1 columns
# Распределение количества газа
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
# Гистограмма для 'Газ 1'
sns.histplot(data=df_gas['Газ 1'],
alpha=0.8,
label='Количество газа')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота',
fontsize=12,
color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'Газ 1'
sns.boxplot(data=df_gas,
x='Газ 1',
orient='horizontal')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('Количество газа',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение количества газа',
fontsize=15,
color='DarkSlateGray')
plt.show()
# Сводная статистика для всех столбцов в `df_gas`
df_gas.describe()
| Газ 1 | |
|---|---|
| count | 3239.000000 |
| mean | 11.002062 |
| std | 6.220327 |
| min | 0.008399 |
| 25% | 7.043089 |
| 50% | 9.836267 |
| 75% | 13.769915 |
| max | 77.995040 |
Аномальных значений не обнаружено.
# Приведём таблицу к общепринятому виду
df_gas = df_gas.rename(columns={'Газ 1':'gas_quantities'})
df_gas
| gas_quantities | |
|---|---|
| key | |
| 1 | 29.749986 |
| 2 | 12.555561 |
| 3 | 28.554793 |
| 4 | 18.841219 |
| 5 | 5.413692 |
| ... | ... |
| 3237 | 5.543905 |
| 3238 | 6.745669 |
| 3239 | 16.023518 |
| 3240 | 11.863103 |
| 3241 | 12.680959 |
3239 rows × 1 columns
Датасет содержит 3239 наблюдений (количество партий), а данные об электродах - 3214. Среднее значение количества газа составляет примерно 11. Стандартное отклонение равно приблизительно 6.22, что указывает на относительно высокую вариацию значений. Минимальное значение количества газа составляет 0.008399, а максимальное значение равно 77.995040. Первый квартиль (25%) равен 7.043089, медиана (50%) составляет 9.836267, а третий квартиль (75%) равен 13.769915.
Исходя из этой информации можно сделать вывод, что измерения количества газа имеют широкий диапазон значений с примерным средним значением 11.002062. Наличие относительно высокого стандартного отклонения указывает на значительную изменчивость значений количества газа. Партий металла обработанных газом больше, чем расплавленных электродами. (Возможно особенность технологии)
df_temp — результаты измерения температуры.¶# Проверим столбец `key` на уникальность значений
df_temp['key'].duplicated().sum()
14876
# Посмотрим уникальные значения по колонкам `df_temp`
get_unique_sorted(df_temp)
Уникальные значения столбца 'key': [ 1 2 3 ... 3239 3240 3241] Количество уникальных значений: 3216 **************************************************************************************************** Уникальные значения столбца 'Время замера': ['2019-05-03T11:02:04.000000000' '2019-05-03T11:07:18.000000000' '2019-05-03T11:11:34.000000000' ... '2019-09-06T17:21:48.000000000' '2019-09-06T17:24:44.000000000' '2019-09-06T17:30:05.000000000'] Количество уникальных значений: 18092 **************************************************************************************************** Уникальные значения столбца 'Температура': [1191. 1204. 1208. 1218. 1227. 1515. 1519. 1520. 1521. 1522. 1525. 1526. 1527. 1528. 1529. 1530. 1531. 1532. 1533. 1534. 1535. 1536. 1537. 1538. 1539. 1540. 1541. 1542. 1543. 1544. 1545. 1546. 1547. 1548. 1549. 1550. 1551. 1552. 1553. 1554. 1555. 1556. 1557. 1558. 1559. 1560. 1561. 1562. 1563. 1564. 1565. 1566. 1567. 1568. 1569. 1570. 1571. 1572. 1573. 1574. 1575. 1576. 1577. 1578. 1579. 1580. 1581. 1582. 1583. 1584. 1585. 1586. 1587. 1588. 1589. 1590. 1591. 1592. 1593. 1594. 1595. 1596. 1597. 1598. 1599. 1600. 1601. 1602. 1603. 1604. 1605. 1606. 1607. 1608. 1609. 1610. 1611. 1612. 1613. 1614. 1615. 1616. 1617. 1618. 1619. 1620. 1621. 1622. 1623. 1624. 1625. 1626. 1627. 1628. 1629. 1630. 1631. 1632. 1633. 1634. 1635. 1636. 1637. 1638. 1639. 1640. 1641. 1642. 1643. 1644. 1645. 1646. 1647. 1648. 1649. 1650. 1651. 1652. 1653. 1654. 1655. 1656. 1657. 1658. 1659. 1660. 1661. 1662. 1663. 1665. 1666. 1667. 1668. 1669. 1670. 1671. 1672. 1673. 1674. 1675. 1676. 1678. 1679. 1680. 1681. 1684. 1690. 1691. 1696. 1700. 1704. 1705. nan] Количество уникальных значений: 173 ****************************************************************************************************
# Сводная статистика для всех столбцов в `df_temp`
df_temp.describe(include='all', datetime_is_numeric=True)
| key | Время замера | Температура | |
|---|---|---|---|
| count | 18092.000000 | 18092 | 14665.000000 |
| mean | 1616.460977 | 2019-07-05 13:36:58.791620608 | 1590.722741 |
| min | 1.000000 | 2019-05-03 11:02:04 | 1191.000000 |
| 25% | 807.750000 | 2019-06-04 00:35:01.249999872 | 1580.000000 |
| 50% | 1618.000000 | 2019-07-03 02:11:48 | 1590.000000 |
| 75% | 2429.000000 | 2019-08-07 23:10:05.249999872 | 1599.000000 |
| max | 3241.000000 | 2019-09-06 17:30:05 | 1705.000000 |
| std | 934.641385 | NaN | 20.394381 |
Время измерения температуры находится в тех же промежутках, что и ранее рассмотренные данные по обработке стали.
Количество партий 3216 (количество уникальных записей в столбце key) не соотносится ни с одной из предыдущих таблиц, но номера партий доходят до 3241 как в данных о продувке сплава газом (df_gas). Количество дубликатов в столбце с номером партии - 14876из 18092 строк.
# Распределение количества газа
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
# Гистограмма для 'Температура'
sns.histplot(data=df_temp['Температура'],
alpha=0.8,
label='Температура')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота',
fontsize=12,
color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'Температура'
sns.boxplot(data=df_temp,
x='Температура',
orient='horizontal')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('Температура',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение показателей температуры',
fontsize=15,
color='DarkSlateGray')
plt.show()
Видим пять значений температур, которые выбиваются из общей массы: 1191, 1204, 1208, 1218, 1227.
Такие температуры более свойствены для закалки.
Остальные показатили в норме.
df_temp
| key | Время замера | Температура | |
|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:04 | 1571.0 |
| 1 | 1 | 2019-05-03 11:07:18 | 1604.0 |
| 2 | 1 | 2019-05-03 11:11:34 | 1618.0 |
| 3 | 1 | 2019-05-03 11:18:04 | 1601.0 |
| 4 | 1 | 2019-05-03 11:25:59 | 1606.0 |
| ... | ... | ... | ... |
| 18087 | 3241 | 2019-09-06 16:55:01 | NaN |
| 18088 | 3241 | 2019-09-06 17:06:38 | NaN |
| 18089 | 3241 | 2019-09-06 17:21:48 | NaN |
| 18090 | 3241 | 2019-09-06 17:24:44 | NaN |
| 18091 | 3241 | 2019-09-06 17:30:05 | NaN |
18092 rows × 3 columns
# Распределение партий по количеству замеров температур
df_count = df_temp.groupby('key')['key'].count().value_counts().reset_index()
df_count.columns = ['кол-во замеров', 'кол-во партий']
total_batches = df_count['кол-во партий'].sum()
df_count['% кол-ва партий'] = df_count['кол-во партий'] / total_batches * 100
df_count
| кол-во замеров | кол-во партий | % кол-ва партий | |
|---|---|---|---|
| 0 | 5 | 892 | 27.736318 |
| 1 | 6 | 759 | 23.600746 |
| 2 | 4 | 520 | 16.169154 |
| 3 | 7 | 490 | 15.236318 |
| 4 | 8 | 205 | 6.374378 |
| 5 | 3 | 174 | 5.410448 |
| 6 | 9 | 84 | 2.611940 |
| 7 | 2 | 39 | 1.212687 |
| 8 | 10 | 28 | 0.870647 |
| 9 | 11 | 9 | 0.279851 |
| 10 | 12 | 5 | 0.155473 |
| 11 | 13 | 3 | 0.093284 |
| 12 | 1 | 2 | 0.062189 |
| 13 | 16 | 2 | 0.062189 |
| 14 | 14 | 2 | 0.062189 |
| 15 | 15 | 1 | 0.031095 |
| 16 | 17 | 1 | 0.031095 |
Большинство партий имели 6 или 5 замеров температур (соответственно 27.74% и 23.6% всех партий). Это может говорить о том, что такое количество замеров является наиболее распространенным или оптимальным для обеспечения приемлемых результатов.
Партии с количеством замеров 4 и 3 составляют значительную долю (соответственно 16.17% и 5.41%). Вероятно, это связано с особенностями процесса производства или проведения замеров температур.
Партии с количеством замеров 7 и 8 составляют значительную долю (соответственно 15.24% и 6.37%).
Партии, в которых делали более 8 замеров, составляют меньшую долю (всего около 4% всех партий). Возможно, такое количество замеров требуется только в особых случаях или для уточнения данных.
Партии с количеством замеров 1 или выше 11 составляют незначительную долю (всего 0.31% всех партий). Это может свидетельствовать о случайных или ошибочных замерах в этих партиях.
Таким образом, стандартное количество замеров температур в партии составляет 5-6 и оно является наиболее распространенным. Все значения за пределами этого диапазона можно считать исключениями или особыми случаями.
# Удаляем строки, где количество замеров меньше двух и температура NaN
df_temp = df_temp.dropna(subset=['Температура'])
df_temp = df_temp.groupby('key').filter(lambda x: len(x) >= 2)
df_temp
| key | Время замера | Температура | |
|---|---|---|---|
| 0 | 1 | 2019-05-03 11:02:04 | 1571.0 |
| 1 | 1 | 2019-05-03 11:07:18 | 1604.0 |
| 2 | 1 | 2019-05-03 11:11:34 | 1618.0 |
| 3 | 1 | 2019-05-03 11:18:04 | 1601.0 |
| 4 | 1 | 2019-05-03 11:25:59 | 1606.0 |
| ... | ... | ... | ... |
| 13921 | 2499 | 2019-08-10 13:33:21 | 1569.0 |
| 13922 | 2499 | 2019-08-10 13:41:34 | 1604.0 |
| 13923 | 2499 | 2019-08-10 13:46:28 | 1593.0 |
| 13924 | 2499 | 2019-08-10 13:54:56 | 1588.0 |
| 13925 | 2499 | 2019-08-10 13:58:58 | 1603.0 |
13924 rows × 3 columns
Есть информация о том, что важна температура первого и последнего замера, по этому создадим датасет с необходимыми нам данными на основе имеющегося.
# Группируем данные по key
grouped_data = df_temp.groupby('key')
# Собираем данные по требуемым колонкам
df_result_temp = pd.DataFrame()
df_result_temp['time_first_measurement'] = grouped_data['Время замера'].first()
df_result_temp['time_last_measurement'] = grouped_data['Время замера'].last()
df_result_temp['temperature_first_measurement'] = grouped_data['Температура'].first()
df_result_temp['temperature_last_measurement'] = grouped_data['Температура'].last()
df_result_temp['time_between_measurements'] = (df_result_temp['time_last_measurement'] - \
df_result_temp['time_first_measurement']).astype('timedelta64[s]')
# Выводим результат
df_result_temp
| time_first_measurement | time_last_measurement | temperature_first_measurement | temperature_last_measurement | time_between_measurements | |
|---|---|---|---|---|---|
| key | |||||
| 1 | 2019-05-03 11:02:04 | 2019-05-03 11:30:38 | 1571.0 | 1613.0 | 1714.0 |
| 2 | 2019-05-03 11:34:04 | 2019-05-03 11:55:09 | 1581.0 | 1602.0 | 1265.0 |
| 3 | 2019-05-03 12:06:44 | 2019-05-03 12:35:57 | 1596.0 | 1599.0 | 1753.0 |
| 4 | 2019-05-03 12:39:27 | 2019-05-03 12:59:47 | 1601.0 | 1625.0 | 1220.0 |
| 5 | 2019-05-03 13:11:03 | 2019-05-03 13:36:39 | 1576.0 | 1602.0 | 1536.0 |
| ... | ... | ... | ... | ... | ... |
| 2495 | 2019-08-10 11:27:47 | 2019-08-10 11:50:47 | 1570.0 | 1591.0 | 1380.0 |
| 2496 | 2019-08-10 11:56:48 | 2019-08-10 12:25:13 | 1554.0 | 1591.0 | 1705.0 |
| 2497 | 2019-08-10 12:37:26 | 2019-08-10 12:53:28 | 1571.0 | 1589.0 | 962.0 |
| 2498 | 2019-08-10 12:58:11 | 2019-08-10 13:23:31 | 1591.0 | 1594.0 | 1520.0 |
| 2499 | 2019-08-10 13:33:21 | 2019-08-10 13:58:58 | 1569.0 | 1603.0 | 1537.0 |
2475 rows × 5 columns
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
# Гистограмма для 'temperature_first_measurement'
sns.histplot(data=df_result_temp['temperature_first_measurement'],
color='green',
alpha=0.8,
label='Температура при первом замере')
# Гистограмма для 'temperature_last_measurement'
sns.histplot(data=df_result_temp['temperature_last_measurement'],
color='red',
alpha=0.5,
label='Температура при последнем замере')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота',
fontsize=12,
color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'temperature_first_measurement' и 'temperature_last_measurement'
sns.boxplot(data=df_result_temp[['temperature_first_measurement', 'temperature_last_measurement']],
orient='horizontal',
palette=['green', 'red'])
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('Температура',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение температуры (при первом и при последнем замере)',
fontsize=15,
color='DarkSlateGray')
plt.show()
df_result_temp[(df_result_temp['temperature_first_measurement'] < 1300) | (
df_result_temp['temperature_last_measurement'] < 1300)]
| time_first_measurement | time_last_measurement | temperature_first_measurement | temperature_last_measurement | time_between_measurements | |
|---|---|---|---|---|---|
| key | |||||
| 867 | 2019-06-06 08:03:39 | 2019-06-06 08:48:23 | 1191.0 | 1599.0 | 2684.0 |
| 1214 | 2019-06-18 08:01:03 | 2019-06-18 08:43:56 | 1208.0 | 1591.0 | 2573.0 |
| 1619 | 2019-07-03 02:34:41 | 2019-07-03 02:43:59 | 1218.0 | 1590.0 | 558.0 |
| 2052 | 2019-07-25 08:49:15 | 2019-07-25 09:27:03 | 1227.0 | 1592.0 | 2268.0 |
Вопрос по анамальным температурам остаётся открытым. (Пять значений температур, которые выбиваются из общей массы: 1191, 1204, 1208, 1218, 1227. В чем причина таких показателей? (остыл ковш, неисправность измерительной аппаратуры) Укладываются они в рамки технологии, или это аномалии? Нужен-ли нагрев (по технологии) выше 1600$^{\circ}$C?)
df_wire — данные о проволочных материалах (объём).¶# Проверим столбец `key` на уникальность значений
df_wire['key'].duplicated().sum()
0
# Столбец `key` установим в качестве индекса датафрейма
df_wire = df_wire.set_index('key')
# Сводная статистика для всех столбцов в `df_wire`
df_wire.describe()
| Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | |
|---|---|---|---|---|---|---|---|---|---|
| count | 3055.000000 | 1079.000000 | 63.000000 | 14.000000 | 1.000 | 73.000000 | 11.000000 | 19.000000 | 29.000000 |
| mean | 100.895853 | 50.577323 | 189.482681 | 57.442841 | 15.132 | 48.016974 | 10.039007 | 53.625193 | 34.155752 |
| std | 42.012518 | 39.320216 | 99.513444 | 28.824667 | NaN | 33.919845 | 8.610584 | 16.881728 | 19.931616 |
| min | 1.918800 | 0.030160 | 0.144144 | 24.148801 | 15.132 | 0.034320 | 0.234208 | 45.076721 | 4.622800 |
| 25% | 72.115684 | 20.193680 | 95.135044 | 40.807002 | 15.132 | 25.053600 | 6.762756 | 46.094879 | 22.058401 |
| 50% | 100.158234 | 40.142956 | 235.194977 | 45.234282 | 15.132 | 42.076324 | 9.017009 | 46.279999 | 30.066399 |
| 75% | 126.060483 | 70.227558 | 276.252014 | 76.124619 | 15.132 | 64.212723 | 11.886057 | 48.089603 | 43.862003 |
| max | 330.314424 | 282.780152 | 385.008668 | 113.231044 | 15.132 | 180.454575 | 32.847674 | 102.762401 | 90.053604 |
В целом, Wire 1, Wire 2 и Wire 3 имеют большое количество данных (3055, 1079 и 63 наблюдения соответственно), в то время как количество данных для Wire 4- Wire 9 намного меньше. Различия в средних значениях и стандартных отклонениях между проволоками указывают на разные характеристики и свойства этих материалов.
Wire 6 использовали только один раз.
# Посмотрим на уникальную запись с добавлением `Wire 5`
df_wire[df_wire['Wire 5'].notna()]
| Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | |
|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||
| 2567 | 18.30192 | NaN | 96.288193 | NaN | 15.132 | 73.307526 | NaN | NaN | NaN |
# Визуализируем распределения по колонкам `df_wire`
plt.figure(figsize=(15, 10))
sns.boxplot(data=df_wire,
orient='horizontal')
plt.xlabel('Объем подачи проволочных материалов',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Материал',
fontsize=12,
color='DarkSlateGray')
plt.title('Boxplot подачи проволочных материалов',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
Всё выглядит хорошо, явных выбросов в объеме проволочных материалов нет.
# Заполним значения NaN в DataFrame нулями
df_wire = df_wire.fillna(0)
# Рассчитаем сумму каждой строки и создим новый столбец с названием `wire_sum`
df_wire['wire_sum'] = df_wire.sum(axis=1)
# Заменим пробелы в названиях столбцов на нижнее подчеркивание и приведем их к нижнему регистру
df_wire.columns = [x.replace(' ','_').lower() for x in df_wire.columns]
df_wire
| wire_1 | wire_2 | wire_3 | wire_4 | wire_5 | wire_6 | wire_7 | wire_8 | wire_9 | wire_sum | |
|---|---|---|---|---|---|---|---|---|---|---|
| key | ||||||||||
| 1 | 60.059998 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 60.059998 |
| 2 | 96.052315 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 96.052315 |
| 3 | 91.160157 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 91.160157 |
| 4 | 89.063515 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 89.063515 |
| 5 | 89.238236 | 9.11456 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 98.352796 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 38.088959 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.088959 |
| 3238 | 56.128799 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 56.128799 |
| 3239 | 143.357761 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 143.357761 |
| 3240 | 34.070400 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 34.070400 |
| 3241 | 63.117595 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 63.117595 |
3081 rows × 10 columns
df_wire_time — данные о проволочных материалах (время).¶# Проверим столбец `key` на уникальность значений
df_wire_time['key'].duplicated().sum()
0
# Столбец `key` установим в качестве индекса датафрейма
df_wire_time = df_wire_time.set_index('key')
# Посмотрим уникальные значения по колонкам `df_wire_time`
get_unique_sorted(df_wire_time)
Уникальные значения столбца 'Wire 1':
['2019-05-03T11:06:19.000000000' '2019-05-03T11:36:50.000000000'
'2019-05-03T12:11:46.000000000' ... '2019-09-06T15:33:55.000000000'
'2019-09-06T17:10:06.000000000' 'NaT']
Количество уникальных значений: 3056
****************************************************************************************************
Уникальные значения столбца 'Wire 2':
['2019-05-03T13:15:34.000000000' '2019-05-03T13:48:52.000000000'
'2019-05-03T15:39:37.000000000' ... '2019-09-06T01:38:22.000000000'
'2019-09-06T07:35:40.000000000' 'NaT']
Количество уникальных значений: 1080
****************************************************************************************************
Уникальные значения столбца 'Wire 3':
['2019-05-04T04:34:27.000000000' '2019-05-04T05:41:29.000000000'
'2019-05-07T15:39:35.000000000' '2019-05-07T17:12:49.000000000'
'2019-05-07T22:24:56.000000000' '2019-05-07T23:37:44.000000000'
'2019-05-08T02:25:41.000000000' '2019-05-08T06:25:36.000000000'
'2019-05-08T07:00:18.000000000' '2019-05-08T07:47:18.000000000'
'2019-05-19T02:20:39.000000000' '2019-05-19T02:57:29.000000000'
'2019-05-19T10:05:03.000000000' '2019-05-19T11:07:37.000000000'
'2019-06-04T12:35:52.000000000' '2019-06-10T15:07:04.000000000'
'2019-06-12T13:28:13.000000000' '2019-06-12T14:01:36.000000000'
'2019-06-12T14:38:14.000000000' '2019-06-14T02:18:05.000000000'
'2019-06-14T21:33:28.000000000' '2019-06-14T22:16:16.000000000'
'2019-07-10T18:21:05.000000000' '2019-07-10T19:02:13.000000000'
'2019-07-10T20:08:55.000000000' '2019-07-11T18:01:03.000000000'
'2019-07-11T19:03:45.000000000' '2019-07-19T04:01:03.000000000'
'2019-07-19T05:01:12.000000000' '2019-07-19T05:51:49.000000000'
'2019-07-21T04:23:54.000000000' '2019-07-21T10:04:47.000000000'
'2019-07-27T03:44:01.000000000' '2019-07-27T05:11:06.000000000'
'2019-07-27T06:26:25.000000000' '2019-07-27T08:51:53.000000000'
'2019-07-27T14:05:34.000000000' '2019-07-27T17:11:04.000000000'
'2019-07-27T20:02:21.000000000' '2019-08-10T14:44:43.000000000'
'2019-08-10T15:43:16.000000000' '2019-08-10T19:15:17.000000000'
'2019-08-10T20:02:36.000000000' '2019-08-10T20:53:12.000000000'
'2019-08-11T03:33:25.000000000' '2019-08-12T20:12:36.000000000'
'2019-08-12T22:14:01.000000000' '2019-08-12T23:35:31.000000000'
'2019-08-13T01:29:08.000000000' '2019-08-13T03:21:46.000000000'
'2019-08-13T07:04:38.000000000' '2019-08-13T07:52:42.000000000'
'2019-08-13T09:36:56.000000000' '2019-08-17T09:26:28.000000000'
'2019-08-17T10:16:32.000000000' '2019-08-17T10:55:55.000000000'
'2019-08-25T23:47:50.000000000' '2019-08-26T01:15:36.000000000'
'2019-08-26T02:15:46.000000000' '2019-08-26T03:34:51.000000000'
'2019-09-01T12:41:21.000000000' '2019-09-01T13:28:06.000000000'
'2019-09-02T07:14:44.000000000' 'NaT']
Количество уникальных значений: 64
****************************************************************************************************
Уникальные значения столбца 'Wire 4':
['2019-05-07T15:19:17.000000000' '2019-05-07T16:46:56.000000000'
'2019-05-07T23:21:42.000000000' '2019-07-20T16:11:26.000000000'
'2019-07-20T23:21:17.000000000' '2019-07-27T03:25:58.000000000'
'2019-07-27T05:07:46.000000000' '2019-07-27T06:23:07.000000000'
'2019-07-27T08:32:14.000000000' '2019-07-27T13:57:58.000000000'
'2019-07-27T17:07:27.000000000' '2019-07-27T19:41:36.000000000'
'2019-08-12T19:51:09.000000000' '2019-08-13T03:16:45.000000000'
'NaT']
Количество уникальных значений: 15
****************************************************************************************************
Уникальные значения столбца 'Wire 5':
['2019-08-13T06:14:30.000000000' 'NaT']
Количество уникальных значений: 2
****************************************************************************************************
Уникальные значения столбца 'Wire 6':
['2019-05-07T14:46:05.000000000' '2019-05-07T16:16:34.000000000'
'2019-05-07T17:37:05.000000000' '2019-05-07T21:53:14.000000000'
'2019-05-07T23:01:42.000000000' '2019-05-08T01:53:35.000000000'
'2019-05-08T06:19:00.000000000' '2019-05-08T06:54:07.000000000'
'2019-05-08T07:39:19.000000000' '2019-05-08T09:01:22.000000000'
'2019-05-08T14:00:10.000000000' '2019-05-08T15:07:48.000000000'
'2019-05-08T17:02:28.000000000' '2019-05-08T17:52:46.000000000'
'2019-05-08T18:38:10.000000000' '2019-05-08T19:14:44.000000000'
'2019-05-08T20:11:04.000000000' '2019-05-08T21:09:25.000000000'
'2019-05-08T21:47:30.000000000' '2019-05-08T22:48:56.000000000'
'2019-05-08T23:48:40.000000000' '2019-05-09T00:31:45.000000000'
'2019-05-09T02:52:53.000000000' '2019-05-09T03:42:13.000000000'
'2019-05-09T05:40:27.000000000' '2019-07-27T02:13:58.000000000'
'2019-07-27T03:11:17.000000000' '2019-07-27T04:47:06.000000000'
'2019-07-27T06:09:34.000000000' '2019-07-27T08:24:44.000000000'
'2019-07-27T13:48:37.000000000' '2019-07-27T16:40:06.000000000'
'2019-07-27T19:15:43.000000000' '2019-07-27T22:52:21.000000000'
'2019-07-28T01:07:46.000000000' '2019-07-28T01:40:36.000000000'
'2019-07-28T05:00:32.000000000' '2019-07-28T10:26:23.000000000'
'2019-07-28T15:27:07.000000000' '2019-07-28T15:57:59.000000000'
'2019-07-28T16:30:04.000000000' '2019-07-28T19:46:20.000000000'
'2019-07-29T05:09:26.000000000' '2019-07-30T19:04:23.000000000'
'2019-07-30T22:33:45.000000000' '2019-08-12T19:49:14.000000000'
'2019-08-12T22:06:25.000000000' '2019-08-12T23:23:18.000000000'
'2019-08-13T01:17:21.000000000' '2019-08-13T03:04:56.000000000'
'2019-08-13T06:10:36.000000000' '2019-08-13T09:26:22.000000000'
'2019-08-13T10:30:20.000000000' '2019-08-13T12:50:54.000000000'
'2019-08-13T13:33:02.000000000' '2019-08-13T15:13:20.000000000'
'2019-08-13T16:09:14.000000000' '2019-08-13T16:48:03.000000000'
'2019-08-13T18:19:24.000000000' '2019-08-13T20:59:00.000000000'
'2019-08-13T22:23:13.000000000' '2019-08-13T22:51:40.000000000'
'2019-08-14T03:53:23.000000000' '2019-08-14T04:41:52.000000000'
'2019-08-18T12:23:50.000000000' '2019-08-18T13:32:48.000000000'
'2019-08-18T14:17:09.000000000' '2019-08-18T14:51:35.000000000'
'2019-08-18T15:43:21.000000000' '2019-08-18T16:20:32.000000000'
'2019-08-18T17:05:42.000000000' '2019-08-18T18:25:03.000000000'
'2019-08-18T19:10:56.000000000' 'NaT']
Количество уникальных значений: 74
****************************************************************************************************
Уникальные значения столбца 'Wire 7':
['2019-07-27T05:49:05.000000000' '2019-07-27T07:56:34.000000000'
'2019-07-27T13:43:32.000000000' '2019-07-27T19:11:22.000000000'
'2019-08-12T19:47:06.000000000' '2019-08-12T21:48:11.000000000'
'2019-08-12T23:20:37.000000000' '2019-08-13T01:13:45.000000000'
'2019-08-13T02:52:06.000000000' '2019-08-13T07:49:49.000000000'
'2019-08-13T10:25:22.000000000' 'NaT']
Количество уникальных значений: 12
****************************************************************************************************
Уникальные значения столбца 'Wire 8':
['2019-05-14T11:29:24.000000000' '2019-05-14T12:18:01.000000000'
'2019-05-14T12:52:37.000000000' '2019-05-14T13:20:41.000000000'
'2019-05-14T14:05:13.000000000' '2019-05-14T14:35:02.000000000'
'2019-05-14T15:07:59.000000000' '2019-05-14T15:43:01.000000000'
'2019-05-14T16:18:12.000000000' '2019-05-14T16:55:09.000000000'
'2019-07-08T16:56:51.000000000' '2019-07-22T08:33:41.000000000'
'2019-07-22T10:05:30.000000000' '2019-08-16T04:37:28.000000000'
'2019-08-16T05:36:19.000000000' '2019-08-16T06:42:52.000000000'
'2019-08-16T07:39:37.000000000' '2019-08-16T08:14:32.000000000'
'2019-08-16T08:56:23.000000000' 'NaT']
Количество уникальных значений: 20
****************************************************************************************************
Уникальные значения столбца 'Wire 9':
['2019-05-04T17:21:27.000000000' '2019-06-02T15:42:11.000000000'
'2019-06-02T18:02:35.000000000' '2019-06-02T18:51:06.000000000'
'2019-06-02T20:35:26.000000000' '2019-06-02T23:08:17.000000000'
'2019-06-09T11:01:10.000000000' '2019-06-09T19:36:17.000000000'
'2019-06-09T19:55:31.000000000' '2019-06-09T20:29:31.000000000'
'2019-06-09T21:38:20.000000000' '2019-06-09T22:22:10.000000000'
'2019-06-09T22:50:56.000000000' '2019-06-11T07:58:12.000000000'
'2019-06-11T10:10:21.000000000' '2019-07-04T21:52:41.000000000'
'2019-07-04T23:02:08.000000000' '2019-07-05T00:44:55.000000000'
'2019-07-05T05:14:22.000000000' '2019-07-05T05:40:57.000000000'
'2019-08-09T06:04:40.000000000' '2019-08-09T06:49:04.000000000'
'2019-08-09T08:01:55.000000000' '2019-08-09T09:11:30.000000000'
'2019-08-09T10:17:13.000000000' '2019-08-09T11:16:48.000000000'
'2019-08-09T13:07:31.000000000' '2019-08-09T14:03:42.000000000'
'2019-09-03T12:55:23.000000000' 'NaT']
Количество уникальных значений: 30
****************************************************************************************************
df_wire_time
| Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | |
|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||
| 1 | 2019-05-03 11:06:19 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 2 | 2019-05-03 11:36:50 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 3 | 2019-05-03 12:11:46 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 4 | 2019-05-03 12:43:22 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 5 | 2019-05-03 13:20:44 | 2019-05-03 13:15:34 | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 2019-09-06 11:33:38 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 3238 | 2019-09-06 12:18:35 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 3239 | 2019-09-06 14:36:11 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 3240 | 2019-09-06 15:33:55 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
| 3241 | 2019-09-06 17:10:06 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT |
3081 rows × 9 columns
В таком виде нам эти данные мало о чем говорят.
Расчитаем длительность процесса подачи проволочных материалов.
df_wire_time['start_time_wire'] = df_wire_time.min(axis=1)
df_wire_time['finish_time_wire'] = df_wire_time.max(axis=1)
df_wire_time['duration_wire'] = (
df_wire_time['finish_time_wire'] - df_wire_time['start_time_wire']).astype('timedelta64[s]')
df_wire_time
| Wire 1 | Wire 2 | Wire 3 | Wire 4 | Wire 5 | Wire 6 | Wire 7 | Wire 8 | Wire 9 | start_time_wire | finish_time_wire | duration_wire | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | ||||||||||||
| 1 | 2019-05-03 11:06:19 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:06:19 | 2019-05-03 11:06:19 | 0.0 |
| 2 | 2019-05-03 11:36:50 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 11:36:50 | 2019-05-03 11:36:50 | 0.0 |
| 3 | 2019-05-03 12:11:46 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:11:46 | 2019-05-03 12:11:46 | 0.0 |
| 4 | 2019-05-03 12:43:22 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 12:43:22 | 2019-05-03 12:43:22 | 0.0 |
| 5 | 2019-05-03 13:20:44 | 2019-05-03 13:15:34 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-05-03 13:15:34 | 2019-05-03 13:20:44 | 310.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 2019-09-06 11:33:38 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 11:33:38 | 2019-09-06 11:33:38 | 0.0 |
| 3238 | 2019-09-06 12:18:35 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 12:18:35 | 2019-09-06 12:18:35 | 0.0 |
| 3239 | 2019-09-06 14:36:11 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 14:36:11 | 2019-09-06 14:36:11 | 0.0 |
| 3240 | 2019-09-06 15:33:55 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 15:33:55 | 2019-09-06 15:33:55 | 0.0 |
| 3241 | 2019-09-06 17:10:06 | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | 2019-09-06 17:10:06 | 2019-09-06 17:10:06 | 0.0 |
3081 rows × 12 columns
Удалим Wire 1 - Wire 15. Получится табличка похожая на df_bulk_time.
df_wire_time = df_wire_time[['start_time_wire', 'finish_time_wire','duration_wire']]
df_wire_time
| start_time_wire | finish_time_wire | duration_wire | |
|---|---|---|---|
| key | |||
| 1 | 2019-05-03 11:06:19 | 2019-05-03 11:06:19 | 0.0 |
| 2 | 2019-05-03 11:36:50 | 2019-05-03 11:36:50 | 0.0 |
| 3 | 2019-05-03 12:11:46 | 2019-05-03 12:11:46 | 0.0 |
| 4 | 2019-05-03 12:43:22 | 2019-05-03 12:43:22 | 0.0 |
| 5 | 2019-05-03 13:15:34 | 2019-05-03 13:20:44 | 310.0 |
| ... | ... | ... | ... |
| 3237 | 2019-09-06 11:33:38 | 2019-09-06 11:33:38 | 0.0 |
| 3238 | 2019-09-06 12:18:35 | 2019-09-06 12:18:35 | 0.0 |
| 3239 | 2019-09-06 14:36:11 | 2019-09-06 14:36:11 | 0.0 |
| 3240 | 2019-09-06 15:33:55 | 2019-09-06 15:33:55 | 0.0 |
| 3241 | 2019-09-06 17:10:06 | 2019-09-06 17:10:06 | 0.0 |
3081 rows × 3 columns
# Сводная статистика для всех столбцов в `df_wire_time`
df_wire_time.describe(include='all', datetime_is_numeric=True)
| start_time_wire | finish_time_wire | duration_wire | |
|---|---|---|---|
| count | 3081 | 3081 | 3081.000000 |
| mean | 2019-07-05 19:30:10.739045632 | 2019-07-05 19:33:45.258357760 | 214.519312 |
| min | 2019-05-03 11:06:19 | 2019-05-03 11:06:19 | 0.000000 |
| 25% | 2019-06-04 15:15:14 | 2019-06-04 15:15:14 | 0.000000 |
| 50% | 2019-07-03 02:37:20 | 2019-07-03 02:41:39 | 0.000000 |
| 75% | 2019-08-08 02:31:52 | 2019-08-08 02:31:52 | 401.000000 |
| max | 2019-09-06 17:10:06 | 2019-09-06 17:10:06 | 5937.000000 |
| std | NaN | NaN | 396.131967 |
# Посмотрим уникальные значения по колонкам `df_wire_time`
get_unique_sorted(df_wire_time)
Уникальные значения столбца 'start_time_wire': ['2019-05-03T11:06:19.000000000' '2019-05-03T11:36:50.000000000' '2019-05-03T12:11:46.000000000' ... '2019-09-06T14:36:11.000000000' '2019-09-06T15:33:55.000000000' '2019-09-06T17:10:06.000000000'] Количество уникальных значений: 3081 **************************************************************************************************** Уникальные значения столбца 'finish_time_wire': ['2019-05-03T11:06:19.000000000' '2019-05-03T11:36:50.000000000' '2019-05-03T12:11:46.000000000' ... '2019-09-06T14:36:11.000000000' '2019-09-06T15:33:55.000000000' '2019-09-06T17:10:06.000000000'] Количество уникальных значений: 3081 **************************************************************************************************** Уникальные значения столбца 'duration_wire': [ 0. 90. 125. 130. 132. 137. 141. 142. 143. 150. 151. 154. 155. 157. 162. 165. 168. 171. 174. 176. 178. 179. 180. 183. 184. 187. 193. 194. 196. 199. 200. 204. 207. 208. 209. 210. 212. 217. 219. 220. 222. 224. 225. 226. 230. 231. 234. 235. 236. 240. 241. 242. 243. 244. 245. 248. 249. 251. 252. 253. 254. 257. 258. 259. 260. 261. 262. 263. 264. 265. 268. 269. 271. 273. 274. 275. 276. 277. 280. 283. 284. 285. 286. 287. 288. 290. 292. 293. 294. 295. 298. 299. 300. 301. 304. 305. 306. 308. 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 329. 330. 331. 332. 333. 334. 335. 336. 337. 338. 339. 340. 341. 342. 343. 344. 345. 347. 348. 349. 350. 351. 352. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364. 367. 369. 370. 371. 372. 373. 374. 375. 376. 377. 378. 379. 380. 381. 382. 383. 384. 385. 386. 387. 389. 390. 392. 393. 394. 395. 396. 397. 398. 399. 400. 401. 402. 403. 404. 405. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415. 416. 417. 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429. 430. 431. 432. 433. 434. 435. 436. 437. 438. 439. 440. 441. 442. 443. 444. 445. 447. 448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459. 460. 461. 462. 463. 464. 465. 466. 467. 468. 469. 470. 471. 472. 473. 475. 476. 477. 478. 479. 480. 481. 482. 483. 484. 485. 486. 487. 488. 489. 490. 491. 492. 493. 494. 496. 498. 499. 502. 503. 504. 505. 506. 507. 508. 509. 510. 511. 513. 514. 515. 516. 517. 518. 520. 521. 523. 525. 526. 527. 528. 529. 530. 531. 532. 533. 535. 536. 537. 538. 540. 541. 542. 543. 544. 545. 546. 547. 548. 549. 550. 551. 552. 553. 554. 555. 556. 558. 561. 563. 564. 565. 566. 568. 571. 573. 574. 575. 576. 578. 579. 580. 583. 584. 585. 586. 587. 588. 589. 590. 593. 594. 595. 596. 597. 598. 599. 600. 601. 602. 603. 604. 605. 606. 607. 609. 610. 611. 612. 615. 616. 618. 619. 620. 621. 622. 623. 624. 625. 626. 627. 628. 629. 630. 631. 632. 633. 634. 636. 638. 639. 644. 645. 646. 647. 648. 649. 651. 652. 653. 654. 657. 659. 660. 663. 665. 666. 667. 668. 669. 670. 672. 673. 674. 676. 678. 682. 683. 684. 685. 688. 690. 691. 694. 695. 697. 698. 699. 700. 704. 705. 706. 708. 711. 715. 718. 719. 722. 723. 726. 727. 729. 730. 731. 733. 734. 735. 736. 737. 738. 739. 740. 741. 744. 747. 748. 749. 750. 751. 754. 755. 756. 757. 759. 760. 761. 762. 764. 765. 767. 768. 771. 774. 775. 776. 777. 780. 781. 783. 784. 785. 786. 787. 790. 791. 792. 794. 797. 800. 801. 802. 803. 807. 808. 812. 813. 817. 819. 830. 835. 839. 841. 842. 847. 850. 853. 854. 856. 859. 860. 861. 865. 867. 868. 871. 875. 876. 879. 881. 889. 894. 900. 902. 904. 915. 919. 921. 923. 925. 926. 932. 937. 943. 957. 961. 973. 987. 994. 995. 1004. 1005. 1007. 1015. 1019. 1030. 1034. 1036. 1037. 1039. 1047. 1048. 1050. 1051. 1052. 1062. 1072. 1075. 1077. 1080. 1082. 1093. 1123. 1135. 1140. 1144. 1154. 1180. 1183. 1222. 1224. 1236. 1277. 1324. 1329. 1333. 1342. 1364. 1369. 1383. 1388. 1421. 1440. 1454. 1495. 1496. 1504. 1548. 1558. 1559. 1580. 1600. 1615. 1664. 1670. 1698. 1704. 1728. 1777. 1799. 1861. 1875. 1888. 1902. 1926. 1963. 1975. 2000. 2003. 2154. 2162. 2197. 2212. 2241. 2252. 2351. 2535. 2548. 2761. 2765. 3375. 3455. 3477. 3578. 5013. 5492. 5937.] Количество уникальных значений: 625 ****************************************************************************************************
Посмотрим на распределение длительности обработки с применением проволочных материалов без учета нулевых значений.
# Создаем графическую фигуру
plt.figure(figsize=(15, 8))
# Pазбиваем графическую фигуру на 2 графика
# Первый график
plt.subplot(2, 1, 1)
# Гистограмма для 'duration_wire'
sns.histplot(data=df_wire_time[df_wire_time['duration_wire'] != 0]['duration_wire'],
alpha=0.8,
label='Длительность обработки с применением проволочных материалов (с)')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('')
plt.ylabel('Частота',
fontsize=12,
color='DarkSlateGray')
plt.legend()
# Второй график
plt.subplot(2, 1, 2)
# Боксплот для 'duration_wire'
sns.boxplot(data=df_wire_time[df_wire_time['duration_wire'] != 0],
x='duration_wire',
orient='horizontal')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.xlabel('Длительность обработки с применением проволочных материалов (с)',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('')
plt.suptitle('Распределение длительности обработки с применением проволочных материалов (с)',
fontsize=15,
color='DarkSlateGray')
plt.show()
Аномальных значений не обнаружено.
Время добавления проволочных материалов находится в тех же промежутках, что и ранее рассмотренные данные по обработке стали. Максимальная длительность обработки одной партии составляет 5937 с (около 1,6 часа).
data_arc_new.csv — данные об электродах) за промежуток с 13 по 18 июля.data_bulk_new.csv — данные о подаче сыпучих материалов (объём):Bulk 1...Bulk 15 - это разные материалы? (Да, разные материалы)Bulk 8 действительно использовали один раз за период с 2019-05-03 по 2019-09-06?Bulk 5 имеет одно значение 603 при условии, что значения, привышающие максимум (лежат за границей выброса) распределения: 234, 242, 256, 293. Это нормально?
Bulk 12 имеет одно значение 1849 при условии, что значения, привышающие максимум (лежат за границей выброса) распределения: 496 - 853. Это нормально?
После ознакомления с данными и уточнения информациии дообработаем таблички. (удалим редкие (единичные) случаи и аномальные значения.
# Удалим редкий сыпучий материал
df_bulk = df_bulk.drop('bulk_8', axis=1)
# Удалим данные с температурами ниже температуры плавления
df_result_temp = df_result_temp.drop(df_result_temp[(df_result_temp['temperature_first_measurement'] < 1300) | (
df_result_temp['temperature_last_measurement'] < 1300)].index)
# Удалим редкий проволочный материал
df_wire = df_wire.drop('wire_5', axis=1)
В результате предобработки получили:
display(df_arc_key, df_bulk, df_bulk_time, df_gas, df_result_temp, df_wire, df_wire_time)
| count_arc | start_arc | end_arc | active_power | reactive_power | apparent_power | arc_duration | energy_consumption | |
|---|---|---|---|---|---|---|---|---|
| key | ||||||||
| 1 | 5 | 2019-05-03 11:02:14 | 2019-05-03 11:28:37 | 3.036730 | 2.142821 | 3.718736 | 1098.0 | 770.282114 |
| 2 | 4 | 2019-05-03 11:34:14 | 2019-05-03 11:53:18 | 2.139408 | 1.453357 | 2.588349 | 811.0 | 481.760005 |
| 3 | 5 | 2019-05-03 12:06:54 | 2019-05-03 12:32:19 | 4.063641 | 2.937457 | 5.019223 | 655.0 | 722.837668 |
| 4 | 4 | 2019-05-03 12:39:37 | 2019-05-03 12:57:50 | 2.706489 | 2.056992 | 3.400038 | 741.0 | 683.455597 |
| 5 | 4 | 2019-05-03 13:11:13 | 2019-05-03 13:33:55 | 2.252950 | 1.687991 | 2.816980 | 869.0 | 512.169934 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 5 | 2019-09-06 11:31:25 | 2019-09-06 11:53:55 | 2.541872 | 2.025417 | 3.250657 | 909.0 | 630.503534 |
| 3238 | 3 | 2019-09-06 12:16:25 | 2019-09-06 12:31:35 | 1.374821 | 1.038103 | 1.723937 | 546.0 | 286.052252 |
| 3239 | 8 | 2019-09-06 14:17:00 | 2019-09-06 15:05:50 | 4.848005 | 3.541541 | 6.014480 | 1216.0 | 941.538764 |
| 3240 | 5 | 2019-09-06 15:25:31 | 2019-09-06 16:24:15 | 3.317679 | 2.373552 | 4.082920 | 839.0 | 657.439848 |
| 3241 | 5 | 2019-09-06 16:49:05 | 2019-09-06 17:26:15 | 3.045283 | 2.140011 | 3.722880 | 659.0 | 538.258300 |
3214 rows × 8 columns
| bulk_1 | bulk_2 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_7 | bulk_9 | bulk_10 | bulk_11 | bulk_12 | bulk_13 | bulk_14 | bulk_15 | bulk_sum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||
| 1 | 0.0 | 0.0 | 0.0 | 43.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 206.0 | 0.0 | 150.0 | 154.0 | 553.0 |
| 2 | 0.0 | 0.0 | 0.0 | 73.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 206.0 | 0.0 | 149.0 | 154.0 | 582.0 |
| 3 | 0.0 | 0.0 | 0.0 | 34.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 205.0 | 0.0 | 152.0 | 153.0 | 544.0 |
| 4 | 0.0 | 0.0 | 0.0 | 81.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 207.0 | 0.0 | 153.0 | 154.0 | 595.0 |
| 5 | 0.0 | 0.0 | 0.0 | 78.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 203.0 | 0.0 | 151.0 | 152.0 | 584.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 0.0 | 0.0 | 170.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 252.0 | 0.0 | 130.0 | 206.0 | 758.0 |
| 3238 | 0.0 | 0.0 | 126.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 254.0 | 0.0 | 108.0 | 106.0 | 594.0 |
| 3239 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 114.0 | 0.0 | 0.0 | 0.0 | 0.0 | 158.0 | 0.0 | 270.0 | 88.0 | 630.0 |
| 3240 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 26.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 192.0 | 54.0 | 272.0 |
| 3241 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 180.0 | 52.0 | 232.0 |
3129 rows × 15 columns
| start_time_bulk | finish_time_bulk | duration_bulk | |
|---|---|---|---|
| key | |||
| 1 | 2019-05-03 11:10:43 | 2019-05-03 11:28:48 | 1085.0 |
| 2 | 2019-05-03 11:36:50 | 2019-05-03 11:53:30 | 1000.0 |
| 3 | 2019-05-03 12:16:16 | 2019-05-03 12:32:39 | 983.0 |
| 4 | 2019-05-03 12:43:22 | 2019-05-03 12:58:00 | 878.0 |
| 5 | 2019-05-03 13:30:47 | 2019-05-03 13:34:12 | 205.0 |
| ... | ... | ... | ... |
| 3237 | 2019-09-06 11:40:06 | 2019-09-06 11:54:15 | 849.0 |
| 3238 | 2019-09-06 12:18:35 | 2019-09-06 12:31:49 | 794.0 |
| 3239 | 2019-09-06 14:48:06 | 2019-09-06 15:06:00 | 1074.0 |
| 3240 | 2019-09-06 16:01:34 | 2019-09-06 16:24:28 | 1374.0 |
| 3241 | 2019-09-06 17:23:15 | 2019-09-06 17:26:33 | 198.0 |
3129 rows × 3 columns
| gas_quantities | |
|---|---|
| key | |
| 1 | 29.749986 |
| 2 | 12.555561 |
| 3 | 28.554793 |
| 4 | 18.841219 |
| 5 | 5.413692 |
| ... | ... |
| 3237 | 5.543905 |
| 3238 | 6.745669 |
| 3239 | 16.023518 |
| 3240 | 11.863103 |
| 3241 | 12.680959 |
3239 rows × 1 columns
| time_first_measurement | time_last_measurement | temperature_first_measurement | temperature_last_measurement | time_between_measurements | |
|---|---|---|---|---|---|
| key | |||||
| 1 | 2019-05-03 11:02:04 | 2019-05-03 11:30:38 | 1571.0 | 1613.0 | 1714.0 |
| 2 | 2019-05-03 11:34:04 | 2019-05-03 11:55:09 | 1581.0 | 1602.0 | 1265.0 |
| 3 | 2019-05-03 12:06:44 | 2019-05-03 12:35:57 | 1596.0 | 1599.0 | 1753.0 |
| 4 | 2019-05-03 12:39:27 | 2019-05-03 12:59:47 | 1601.0 | 1625.0 | 1220.0 |
| 5 | 2019-05-03 13:11:03 | 2019-05-03 13:36:39 | 1576.0 | 1602.0 | 1536.0 |
| ... | ... | ... | ... | ... | ... |
| 2495 | 2019-08-10 11:27:47 | 2019-08-10 11:50:47 | 1570.0 | 1591.0 | 1380.0 |
| 2496 | 2019-08-10 11:56:48 | 2019-08-10 12:25:13 | 1554.0 | 1591.0 | 1705.0 |
| 2497 | 2019-08-10 12:37:26 | 2019-08-10 12:53:28 | 1571.0 | 1589.0 | 962.0 |
| 2498 | 2019-08-10 12:58:11 | 2019-08-10 13:23:31 | 1591.0 | 1594.0 | 1520.0 |
| 2499 | 2019-08-10 13:33:21 | 2019-08-10 13:58:58 | 1569.0 | 1603.0 | 1537.0 |
2471 rows × 5 columns
| wire_1 | wire_2 | wire_3 | wire_4 | wire_6 | wire_7 | wire_8 | wire_9 | wire_sum | |
|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||
| 1 | 60.059998 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 60.059998 |
| 2 | 96.052315 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 96.052315 |
| 3 | 91.160157 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 91.160157 |
| 4 | 89.063515 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 89.063515 |
| 5 | 89.238236 | 9.11456 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 98.352796 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3237 | 38.088959 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.088959 |
| 3238 | 56.128799 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 56.128799 |
| 3239 | 143.357761 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 143.357761 |
| 3240 | 34.070400 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 34.070400 |
| 3241 | 63.117595 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 63.117595 |
3081 rows × 9 columns
| start_time_wire | finish_time_wire | duration_wire | |
|---|---|---|---|
| key | |||
| 1 | 2019-05-03 11:06:19 | 2019-05-03 11:06:19 | 0.0 |
| 2 | 2019-05-03 11:36:50 | 2019-05-03 11:36:50 | 0.0 |
| 3 | 2019-05-03 12:11:46 | 2019-05-03 12:11:46 | 0.0 |
| 4 | 2019-05-03 12:43:22 | 2019-05-03 12:43:22 | 0.0 |
| 5 | 2019-05-03 13:15:34 | 2019-05-03 13:20:44 | 310.0 |
| ... | ... | ... | ... |
| 3237 | 2019-09-06 11:33:38 | 2019-09-06 11:33:38 | 0.0 |
| 3238 | 2019-09-06 12:18:35 | 2019-09-06 12:18:35 | 0.0 |
| 3239 | 2019-09-06 14:36:11 | 2019-09-06 14:36:11 | 0.0 |
| 3240 | 2019-09-06 15:33:55 | 2019-09-06 15:33:55 | 0.0 |
| 3241 | 2019-09-06 17:10:06 | 2019-09-06 17:10:06 | 0.0 |
3081 rows × 3 columns
Составим окончательную таблицу для обучения моделей, включая только те партии, для которых доступна информация по каждому этапу подготовки сплава. Причина такого выбора состоит в том, что каждая партия проходит все этапы подготовки.
# Объединим 7 таблиц в одну (присоединять будем к самой короткой)
unified_table = df_result_temp.join(
[df_arc_key, df_bulk, df_bulk_time, df_gas, df_wire, df_wire_time], how='inner')
unified_table
| time_first_measurement | time_last_measurement | temperature_first_measurement | temperature_last_measurement | time_between_measurements | count_arc | start_arc | end_arc | active_power | reactive_power | ... | wire_3 | wire_4 | wire_6 | wire_7 | wire_8 | wire_9 | wire_sum | start_time_wire | finish_time_wire | duration_wire | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1 | 2019-05-03 11:02:04 | 2019-05-03 11:30:38 | 1571.0 | 1613.0 | 1714.0 | 5 | 2019-05-03 11:02:14 | 2019-05-03 11:28:37 | 3.036730 | 2.142821 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 60.059998 | 2019-05-03 11:06:19 | 2019-05-03 11:06:19 | 0.0 |
| 2 | 2019-05-03 11:34:04 | 2019-05-03 11:55:09 | 1581.0 | 1602.0 | 1265.0 | 4 | 2019-05-03 11:34:14 | 2019-05-03 11:53:18 | 2.139408 | 1.453357 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 96.052315 | 2019-05-03 11:36:50 | 2019-05-03 11:36:50 | 0.0 |
| 3 | 2019-05-03 12:06:44 | 2019-05-03 12:35:57 | 1596.0 | 1599.0 | 1753.0 | 5 | 2019-05-03 12:06:54 | 2019-05-03 12:32:19 | 4.063641 | 2.937457 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 91.160157 | 2019-05-03 12:11:46 | 2019-05-03 12:11:46 | 0.0 |
| 4 | 2019-05-03 12:39:27 | 2019-05-03 12:59:47 | 1601.0 | 1625.0 | 1220.0 | 4 | 2019-05-03 12:39:37 | 2019-05-03 12:57:50 | 2.706489 | 2.056992 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 89.063515 | 2019-05-03 12:43:22 | 2019-05-03 12:43:22 | 0.0 |
| 5 | 2019-05-03 13:11:03 | 2019-05-03 13:36:39 | 1576.0 | 1602.0 | 1536.0 | 4 | 2019-05-03 13:11:13 | 2019-05-03 13:33:55 | 2.252950 | 1.687991 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 98.352796 | 2019-05-03 13:15:34 | 2019-05-03 13:20:44 | 310.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2495 | 2019-08-10 11:27:47 | 2019-08-10 11:50:47 | 1570.0 | 1591.0 | 1380.0 | 4 | 2019-08-10 11:27:57 | 2019-08-10 11:48:05 | 3.168133 | 2.210936 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 89.150879 | 2019-08-10 11:31:40 | 2019-08-10 11:31:40 | 0.0 |
| 2496 | 2019-08-10 11:56:48 | 2019-08-10 12:25:13 | 1554.0 | 1591.0 | 1705.0 | 6 | 2019-08-10 11:56:58 | 2019-08-10 12:23:07 | 4.174918 | 2.872031 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 114.179527 | 2019-08-10 11:59:10 | 2019-08-10 11:59:10 | 0.0 |
| 2497 | 2019-08-10 12:37:26 | 2019-08-10 12:53:28 | 1571.0 | 1589.0 | 962.0 | 3 | 2019-08-10 12:37:36 | 2019-08-10 12:51:20 | 3.605239 | 2.452092 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 103.134723 | 2019-08-10 12:40:30 | 2019-08-10 12:46:10 | 340.0 |
| 2498 | 2019-08-10 12:58:11 | 2019-08-10 13:23:31 | 1591.0 | 1594.0 | 1520.0 | 5 | 2019-08-10 12:58:21 | 2019-08-10 13:20:59 | 3.202310 | 2.239820 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 118.110717 | 2019-08-10 13:02:54 | 2019-08-10 13:02:54 | 0.0 |
| 2499 | 2019-08-10 13:33:21 | 2019-08-10 13:58:58 | 1569.0 | 1603.0 | 1537.0 | 4 | 2019-08-10 13:33:31 | 2019-08-10 13:56:17 | 1.737084 | 1.296836 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 160.166238 | 2019-08-10 13:38:56 | 2019-08-10 13:45:26 | 390.0 |
2325 rows × 44 columns
Так как целевым признаком является последняя измеренная температура, важно, чтоб после измерения температуры больше не каких манипуляций со сплавом не производилось.
''' Подсчитываем количество строк в `unified_table`,
где столбец `time_last_measurement` не является максимальным значением среди
всех столбцов с типом данных `datetime` в `unified_table`.
'''
unified_table[
(
unified_table['time_last_measurement'] >= unified_table[unified_table.select_dtypes(
'datetime').columns].max(axis=1)
)
.astype(int) == 0][unified_table.select_dtypes('datetime').columns].shape[0]
0
# удалим столбцы с типом данных `datetime`
unified_table = unified_table.drop(unified_table.select_dtypes('datetime').columns, axis=1)
unified_table
| temperature_first_measurement | temperature_last_measurement | time_between_measurements | count_arc | active_power | reactive_power | apparent_power | arc_duration | energy_consumption | bulk_1 | ... | wire_1 | wire_2 | wire_3 | wire_4 | wire_6 | wire_7 | wire_8 | wire_9 | wire_sum | duration_wire | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1 | 1571.0 | 1613.0 | 1714.0 | 5 | 3.036730 | 2.142821 | 3.718736 | 1098.0 | 770.282114 | 0.0 | ... | 60.059998 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 60.059998 | 0.0 |
| 2 | 1581.0 | 1602.0 | 1265.0 | 4 | 2.139408 | 1.453357 | 2.588349 | 811.0 | 481.760005 | 0.0 | ... | 96.052315 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 96.052315 | 0.0 |
| 3 | 1596.0 | 1599.0 | 1753.0 | 5 | 4.063641 | 2.937457 | 5.019223 | 655.0 | 722.837668 | 0.0 | ... | 91.160157 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 91.160157 | 0.0 |
| 4 | 1601.0 | 1625.0 | 1220.0 | 4 | 2.706489 | 2.056992 | 3.400038 | 741.0 | 683.455597 | 0.0 | ... | 89.063515 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 89.063515 | 0.0 |
| 5 | 1576.0 | 1602.0 | 1536.0 | 4 | 2.252950 | 1.687991 | 2.816980 | 869.0 | 512.169934 | 0.0 | ... | 89.238236 | 9.11456 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 98.352796 | 310.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2495 | 1570.0 | 1591.0 | 1380.0 | 4 | 3.168133 | 2.210936 | 3.868721 | 723.0 | 694.177326 | 0.0 | ... | 89.150879 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 89.150879 | 0.0 |
| 2496 | 1554.0 | 1591.0 | 1705.0 | 6 | 4.174918 | 2.872031 | 5.070316 | 940.0 | 815.818538 | 0.0 | ... | 114.179527 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 114.179527 | 0.0 |
| 2497 | 1571.0 | 1589.0 | 962.0 | 3 | 3.605239 | 2.452092 | 4.360918 | 569.0 | 823.020520 | 0.0 | ... | 94.086723 | 9.04800 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 103.134723 | 340.0 |
| 2498 | 1591.0 | 1594.0 | 1520.0 | 5 | 3.202310 | 2.239820 | 3.909917 | 750.0 | 581.810739 | 0.0 | ... | 118.110717 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 118.110717 | 0.0 |
| 2499 | 1569.0 | 1603.0 | 1537.0 | 4 | 1.737084 | 1.296836 | 2.169252 | 883.0 | 532.386183 | 0.0 | ... | 110.160958 | 50.00528 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 160.166238 | 390.0 |
2325 rows × 36 columns
# Функция визуализации сильно коррелирующих переменных
def plot_highly_correlated_features(df):
# Вычисляем матрицу корреляции
high_corr_pairs = df.corr().unstack()
# Оставляем только пары, где корреляция от 0.7 до 1
high_corr_pairs = high_corr_pairs[(high_corr_pairs < 1) & (abs(high_corr_pairs) > 0.7)]
# Проверяем, есть ли сильно коррелирующие признаки
if high_corr_pairs.empty:
print("Сильно коррелирующих признаков нет")
else:
# Извлекаем уникальные имена переменных из индекса полученных пар
high_corr_features = high_corr_pairs.index.get_level_values(0).unique()
# Создаем новый DataFrame, содержащий только сильно коррелирующие переменные
highly_corr_df = df[high_corr_features]
# Вычисляем корреляционную матрицу для сильно коррелирующих переменных
corr_matrix = highly_corr_df.corr()
# Устанавливаем размер и цвет графика
sns.set(rc={'figure.figsize': (15, 5)})
# Создаем тепловую карту корреляционной матрицы
sns.heatmap(corr_matrix,
annot=True,
cmap='turbo',
vmin=-1,
vmax=1,
center=0,
cbar_kws={'orientation': 'vertical'},
mask=np.triu(np.ones_like(corr_matrix)))
# Добавляем название графику
plt.title('Корреляционная матрица сильно коррелирующих переменных',
fontsize=15,
color='DarkSlateGray')
plt.show()
plot_highly_correlated_features(unified_table)
Тот факт, что между активной и реактивной мощностью высокая положительная корреляция был нами отмечен в самом начале обработки даных, была вычеслена полная мощность, которая объединяет оба этих параметра. Мы вычисляли продолжительность дуги и умножая на полную мощность определяли потребляемую энергию. Таким образом потребляемая энергия объединяет все четыре параметра. Можно оставить только потребляемую энергию, чтоб не создавать лишних связей между параметрами и не путать модель.
Общее количество сыпучих материалов коррелирует с материалом bulk_12 и количество некоторых подаваемых материалов коррелирует между собой, видимо это особенность технологического процесса. Изменять связи внутри технологического процесса не будем.
# Удалим столбцы с активной, реактивной и полной мощностью, с продолжительностью дуги
unified_table.drop(['active_power', 'reactive_power',
'arc_duration', 'apparent_power'], axis=1, inplace=True)
plot_highly_correlated_features(unified_table)
Видно, что некоторые переменные по прежнему имеют высокую корреляцию между собой. Например, energy_consumption имеет высокую корреляцию с count_arc (0.71) и bulk_sum (0.5), bulk_sum в свою очередь сильно коррелирует с bulk_12 (0.87). Удалим count_arc и bulk_sum.
Видна тесная взаимосвязь bulk_7-wire_4 и bulk_9-wire_8.
Такая высокая корреляция может указывать на наличие мультиколлинеарности между этими переменными. Мультиколлинеарность может внести нестабильность и неоднозначность в регрессионные модели, и затруднить интерпретацию важности каждой переменной.
Для избавления от мультиколлинеарности удалим count_arc и bulk_sum, а так-же объеденим попарно bulk_7-wire_4 и bulk_9-wire_8.
# Удалим столбцы с количеством этапов обработки и общим количеством подаваемых сыпучих материалов.
unified_table.drop(['count_arc', 'bulk_sum'], axis=1, inplace=True)
# Добавление столбца bulk_7_wire_4
unified_table['bulk_7_wire_4'] = unified_table['bulk_7'] + unified_table['wire_4']
# Добавление столбца bulk_9_wire_8
unified_table['bulk_9_wire_8'] = unified_table['bulk_9'] + unified_table['wire_8']
# Удаление столбцов bulk_7, wire_4, bulk_9, wire_8
unified_table.drop(['bulk_7', 'wire_4', 'bulk_9', 'wire_8'], axis=1, inplace=True)
unified_table
| temperature_first_measurement | temperature_last_measurement | time_between_measurements | energy_consumption | bulk_1 | bulk_2 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | ... | wire_1 | wire_2 | wire_3 | wire_6 | wire_7 | wire_9 | wire_sum | duration_wire | bulk_7_wire_4 | bulk_9_wire_8 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1 | 1571.0 | 1613.0 | 1714.0 | 770.282114 | 0.0 | 0.0 | 0.0 | 43.0 | 0.0 | 0.0 | ... | 60.059998 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 60.059998 | 0.0 | 0.0 | 0.0 |
| 2 | 1581.0 | 1602.0 | 1265.0 | 481.760005 | 0.0 | 0.0 | 0.0 | 73.0 | 0.0 | 0.0 | ... | 96.052315 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 96.052315 | 0.0 | 0.0 | 0.0 |
| 3 | 1596.0 | 1599.0 | 1753.0 | 722.837668 | 0.0 | 0.0 | 0.0 | 34.0 | 0.0 | 0.0 | ... | 91.160157 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 91.160157 | 0.0 | 0.0 | 0.0 |
| 4 | 1601.0 | 1625.0 | 1220.0 | 683.455597 | 0.0 | 0.0 | 0.0 | 81.0 | 0.0 | 0.0 | ... | 89.063515 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 89.063515 | 0.0 | 0.0 | 0.0 |
| 5 | 1576.0 | 1602.0 | 1536.0 | 512.169934 | 0.0 | 0.0 | 0.0 | 78.0 | 0.0 | 0.0 | ... | 89.238236 | 9.11456 | 0.0 | 0.0 | 0.0 | 0.0 | 98.352796 | 310.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2495 | 1570.0 | 1591.0 | 1380.0 | 694.177326 | 0.0 | 0.0 | 21.0 | 0.0 | 0.0 | 0.0 | ... | 89.150879 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 89.150879 | 0.0 | 0.0 | 0.0 |
| 2496 | 1554.0 | 1591.0 | 1705.0 | 815.818538 | 0.0 | 0.0 | 0.0 | 63.0 | 0.0 | 0.0 | ... | 114.179527 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 114.179527 | 0.0 | 0.0 | 0.0 |
| 2497 | 1571.0 | 1589.0 | 962.0 | 823.020520 | 0.0 | 0.0 | 0.0 | 85.0 | 0.0 | 0.0 | ... | 94.086723 | 9.04800 | 0.0 | 0.0 | 0.0 | 0.0 | 103.134723 | 340.0 | 0.0 | 0.0 |
| 2498 | 1591.0 | 1594.0 | 1520.0 | 581.810739 | 0.0 | 0.0 | 90.0 | 0.0 | 0.0 | 0.0 | ... | 118.110717 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 118.110717 | 0.0 | 0.0 | 0.0 |
| 2499 | 1569.0 | 1603.0 | 1537.0 | 532.386183 | 0.0 | 0.0 | 47.0 | 0.0 | 0.0 | 0.0 | ... | 110.160958 | 50.00528 | 0.0 | 0.0 | 0.0 | 0.0 | 160.166238 | 390.0 | 0.0 | 0.0 |
2325 rows × 28 columns
plot_highly_correlated_features(unified_table)
# Добавление столбца bulk_7_wire_4_bulk_2
unified_table['bulk_7_wire_4_bulk_2'] = unified_table['bulk_7_wire_4'] + unified_table['bulk_2']
# Удаление столбцов bulk_7_wire_4, bulk_2
unified_table.drop(['bulk_7_wire_4', 'bulk_2'], axis=1, inplace=True)
plot_highly_correlated_features(unified_table)
Сильно коррелирующих признаков нет
Сильной корреляции между собой признаками нет.
Итоговая таблица:
unified_table
| temperature_first_measurement | temperature_last_measurement | time_between_measurements | energy_consumption | bulk_1 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_10 | ... | wire_1 | wire_2 | wire_3 | wire_6 | wire_7 | wire_9 | wire_sum | duration_wire | bulk_9_wire_8 | bulk_7_wire_4_bulk_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1 | 1571.0 | 1613.0 | 1714.0 | 770.282114 | 0.0 | 0.0 | 43.0 | 0.0 | 0.0 | 0.0 | ... | 60.059998 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 60.059998 | 0.0 | 0.0 | 0.0 |
| 2 | 1581.0 | 1602.0 | 1265.0 | 481.760005 | 0.0 | 0.0 | 73.0 | 0.0 | 0.0 | 0.0 | ... | 96.052315 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 96.052315 | 0.0 | 0.0 | 0.0 |
| 3 | 1596.0 | 1599.0 | 1753.0 | 722.837668 | 0.0 | 0.0 | 34.0 | 0.0 | 0.0 | 0.0 | ... | 91.160157 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 91.160157 | 0.0 | 0.0 | 0.0 |
| 4 | 1601.0 | 1625.0 | 1220.0 | 683.455597 | 0.0 | 0.0 | 81.0 | 0.0 | 0.0 | 0.0 | ... | 89.063515 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 89.063515 | 0.0 | 0.0 | 0.0 |
| 5 | 1576.0 | 1602.0 | 1536.0 | 512.169934 | 0.0 | 0.0 | 78.0 | 0.0 | 0.0 | 0.0 | ... | 89.238236 | 9.11456 | 0.0 | 0.0 | 0.0 | 0.0 | 98.352796 | 310.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2495 | 1570.0 | 1591.0 | 1380.0 | 694.177326 | 0.0 | 21.0 | 0.0 | 0.0 | 0.0 | 90.0 | ... | 89.150879 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 89.150879 | 0.0 | 0.0 | 0.0 |
| 2496 | 1554.0 | 1591.0 | 1705.0 | 815.818538 | 0.0 | 0.0 | 63.0 | 0.0 | 0.0 | 122.0 | ... | 114.179527 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 114.179527 | 0.0 | 0.0 | 0.0 |
| 2497 | 1571.0 | 1589.0 | 962.0 | 823.020520 | 0.0 | 0.0 | 85.0 | 0.0 | 0.0 | 0.0 | ... | 94.086723 | 9.04800 | 0.0 | 0.0 | 0.0 | 0.0 | 103.134723 | 340.0 | 0.0 | 0.0 |
| 2498 | 1591.0 | 1594.0 | 1520.0 | 581.810739 | 0.0 | 90.0 | 0.0 | 0.0 | 0.0 | 101.0 | ... | 118.110717 | 0.00000 | 0.0 | 0.0 | 0.0 | 0.0 | 118.110717 | 0.0 | 0.0 | 0.0 |
| 2499 | 1569.0 | 1603.0 | 1537.0 | 532.386183 | 0.0 | 47.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 110.160958 | 50.00528 | 0.0 | 0.0 | 0.0 | 0.0 | 160.166238 | 390.0 | 0.0 | 0.0 |
2325 rows × 27 columns
# Создаем пустую серию для хранения корреляций с целевым столбцом
corr_target = pd.Series(dtype='float64')
# Итерируемся по столбцам таблицы
for col in unified_table.columns:
# Считаем корреляцию между столбцом и целевым столбцом и добавляем в серию
corr_target[col] = unified_table['temperature_last_measurement'].corr(
unified_table[col])
# Сортируем серию по убыванию и исключаем целевой столбец
corr_target = corr_target.sort_values(ascending=False)[1:]
# Создаем график
plt.figure(figsize=(15, 5))
sns.barplot(x=corr_target.index,
y=corr_target.values)
plt.xticks(rotation=90)
plt.title('Корреляция признаков с последним измерением температуры',
fontsize=15,
color='DarkSlateGray')
plt.xlabel('Признаки',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Корреляция',
fontsize=12,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
Прямой зависимости целевого признака от какого-либо параметра нет. Без построения модели нельзя сказать от чего зависит температура на заключительном этапе.
# Разделим данные на обучающую и тестовую выборки
train, test = train_test_split(
unified_table, test_size=0.25, random_state=RANDOM_STATE)
# Определим целевой признак для обучающей выборки
target_train = train['temperature_last_measurement']
# Определим признаки для обучающей выборки
features_train = train.drop(['temperature_last_measurement'], axis=1)
# Определим целевой признак для тестовой выборки
target_test = test['temperature_last_measurement']
# Определим признаки для тестовой выборки
features_test = test.drop(['temperature_last_measurement'], axis=1)
# Отобразим результат
display(target_train, features_train, target_test, features_test)
key
1973 1579.0
802 1592.0
959 1617.0
680 1588.0
970 1579.0
...
119 1593.0
809 1604.0
1775 1596.0
1966 1578.0
1580 1593.0
Name: temperature_last_measurement, Length: 1743, dtype: float64
| temperature_first_measurement | time_between_measurements | energy_consumption | bulk_1 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_10 | bulk_11 | ... | wire_1 | wire_2 | wire_3 | wire_6 | wire_7 | wire_9 | wire_sum | duration_wire | bulk_9_wire_8 | bulk_7_wire_4_bulk_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1973 | 1562.0 | 1066.0 | 598.994020 | 0.0 | 310.0 | 0.0 | 0.0 | 93.0 | 0.0 | 0.0 | ... | 40.098240 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 40.098240 | 0.0 | 0.0 | 0.0 |
| 802 | 1580.0 | 3871.0 | 688.957621 | 0.0 | 190.0 | 0.0 | 0.0 | 46.0 | 0.0 | 0.0 | ... | 29.249999 | 80.376400 | 0.000000 | 0.000000 | 0.0 | 0.0 | 109.626399 | 546.0 | 0.0 | 0.0 |
| 959 | 1586.0 | 8222.0 | 1229.980397 | 0.0 | 66.0 | 23.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 123.037201 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 123.037201 | 0.0 | 0.0 | 0.0 |
| 680 | 1613.0 | 2993.0 | 578.532433 | 0.0 | 44.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 120.042000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 120.042000 | 0.0 | 0.0 | 0.0 |
| 970 | 1574.0 | 1498.0 | 219.296917 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 84.099601 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 84.099601 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 119 | 1628.0 | 3897.0 | 200.972392 | 51.0 | 0.0 | 50.0 | 86.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 93.117027 | 43.174561 | 0.0 | 0.0 | 136.291588 | 1926.0 | 0.0 | 233.0 |
| 809 | 1587.0 | 2795.0 | 856.165815 | 0.0 | 405.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 137.638809 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 137.638809 | 0.0 | 0.0 | 0.0 |
| 1775 | 1604.0 | 1744.0 | 351.696370 | 0.0 | 21.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 145.126801 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 145.126801 | 0.0 | 0.0 | 0.0 |
| 1966 | 1527.0 | 2816.0 | 1112.650279 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 97.0 | ... | 65.126877 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 65.126877 | 0.0 | 0.0 | 0.0 |
| 1580 | 1579.0 | 1327.0 | 595.244297 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 13.993200 | 60.078716 | 0.000000 | 0.000000 | 0.0 | 0.0 | 74.071916 | 450.0 | 0.0 | 0.0 |
1743 rows × 26 columns
key
1480 1597.0
2322 1590.0
673 1614.0
660 1590.0
1357 1600.0
...
335 1612.0
2151 1620.0
537 1585.0
650 1621.0
569 1625.0
Name: temperature_last_measurement, Length: 582, dtype: float64
| temperature_first_measurement | time_between_measurements | energy_consumption | bulk_1 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_10 | bulk_11 | ... | wire_1 | wire_2 | wire_3 | wire_6 | wire_7 | wire_9 | wire_sum | duration_wire | bulk_9_wire_8 | bulk_7_wire_4_bulk_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1480 | 1567.0 | 1255.0 | 443.225773 | 0.0 | 0.0 | 107.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 122.990398 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 122.990398 | 0.0 | 0.0 | 0.0 |
| 2322 | 1616.0 | 3225.0 | 793.032361 | 52.0 | 154.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 172.273911 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 172.273911 | 0.0 | 0.0 | 0.0 |
| 673 | 1609.0 | 2671.0 | 715.322078 | 0.0 | 87.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 134.877605 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 134.877605 | 0.0 | 0.0 | 0.0 |
| 660 | 1619.0 | 3440.0 | 768.955278 | 0.0 | 0.0 | 110.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 76.143600 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 76.143600 | 0.0 | 0.0 | 0.0 |
| 1357 | 1538.0 | 1989.0 | 1046.797454 | 0.0 | 0.0 | 231.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 105.112795 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 105.112795 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 335 | 1564.0 | 4258.0 | 1556.605508 | 0.0 | 0.0 | 0.0 | 0.0 | 162.0 | 0.0 | 0.0 | ... | 73.429200 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 73.429200 | 0.0 | 0.0 | 0.0 |
| 2151 | 1622.0 | 7772.0 | 1257.999589 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 126.0 | 0.0 | ... | 145.236007 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 145.236007 | 0.0 | 0.0 | 0.0 |
| 537 | 1590.0 | 4248.0 | 974.347090 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 100.011597 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 100.011597 | 0.0 | 0.0 | 0.0 |
| 650 | 1577.0 | 6106.0 | 1885.358882 | 0.0 | 0.0 | 19.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 170.398798 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 170.398798 | 0.0 | 0.0 | 0.0 |
| 569 | 1599.0 | 1352.0 | 862.814124 | 0.0 | 290.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 49.186798 | 60.169201 | 0.0 | 0.0 | 0.0 | 0.0 | 109.355999 | 477.0 | 0.0 | 0.0 |
582 rows × 26 columns
# Получаем названия числовых признаков
numeric = features_train.columns
# Создаем экземпляр стандартизатора
scaler = StandardScaler()
# Обучаем и проводим стандартизацию на тренировочных данных
scaler.fit(features_train[numeric])
features_train[numeric] = scaler.transform(features_train[numeric])
# Проводим ту же стандартизацию на тестовых данных
features_test[numeric] = scaler.transform(features_test[numeric])
# Выводим преобразованные тренировочные и тестовые данные
display(features_train, features_test)
| temperature_first_measurement | time_between_measurements | energy_consumption | bulk_1 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_10 | bulk_11 | ... | wire_1 | wire_2 | wire_3 | wire_6 | wire_7 | wire_9 | wire_sum | duration_wire | bulk_9_wire_8 | bulk_7_wire_4_bulk_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1973 | -1.068818 | -0.869244 | -0.222219 | -0.278179 | 3.468995 | -0.627861 | -0.125004 | 1.221972 | -0.243976 | -0.186131 | ... | -1.431246 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -1.321379 | -0.522456 | -0.072593 | -0.076135 |
| 802 | -0.304182 | 1.037816 | 0.057718 | -0.278179 | 1.879286 | -0.627861 | -0.125004 | 0.404780 | -0.243976 | -0.186131 | ... | -1.680569 | 1.880855 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.245646 | 0.794695 | -0.072593 | -0.076135 |
| 959 | -0.049304 | 3.995967 | 1.741205 | -0.278179 | 0.236587 | -0.238342 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.474928 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.038155 | -0.522456 | -0.072593 | -0.076135 |
| 680 | 1.097649 | 0.440882 | -0.285889 | -0.278179 | -0.054860 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.406090 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.084496 | -0.522456 | -0.072593 | -0.076135 |
| 970 | -0.559061 | -0.575536 | -1.403713 | -0.278179 | -0.637753 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | -0.419969 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.640594 | -0.522456 | -0.072593 | -0.076135 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 119 | 1.734846 | 1.055492 | -1.460733 | 4.077678 | -0.637753 | 0.218919 | 3.494437 | -0.395024 | -0.243976 | -0.186131 | ... | -2.352818 | -0.524372 | 3.085229 | 4.644601 | -0.035756 | -0.091804 | 0.166916 | 4.123757 | -0.072593 | 4.430674 |
| 809 | -0.006824 | 0.306266 | 0.578016 | -0.278179 | 4.727515 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.810515 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | 0.187760 | -0.522456 | -0.072593 | -0.076135 |
| 1775 | 0.715332 | -0.408286 | -0.991729 | -0.278179 | -0.359554 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.982610 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | 0.303614 | -0.522456 | -0.072593 | -0.076135 |
| 1966 | -2.555609 | 0.320544 | 1.376112 | -0.278179 | -0.637753 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | 3.968230 | ... | -0.856016 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.934138 | -0.522456 | -0.072593 | -0.076135 |
| 1580 | -0.346662 | -0.691795 | -0.233887 | -0.278179 | -0.637753 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | -2.031214 | 1.273456 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.795742 | 0.563108 | -0.072593 | -0.076135 |
1743 rows × 26 columns
| temperature_first_measurement | time_between_measurements | energy_consumption | bulk_1 | bulk_3 | bulk_4 | bulk_5 | bulk_6 | bulk_10 | bulk_11 | ... | wire_1 | wire_2 | wire_3 | wire_6 | wire_7 | wire_9 | wire_sum | duration_wire | bulk_9_wire_8 | bulk_7_wire_4_bulk_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| key | |||||||||||||||||||||
| 1480 | -0.856419 | -0.740747 | -0.706919 | -0.278179 | -0.637753 | 1.184248 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.473853 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.038879 | -0.522456 | -0.072593 | -0.076135 |
| 2322 | 1.225089 | 0.598614 | 0.381565 | 4.163087 | 1.402373 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 1.606528 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | 0.723631 | -0.522456 | -0.072593 | -0.076135 |
| 673 | 0.927730 | 0.221961 | 0.139756 | -0.278179 | 0.514786 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.747054 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | 0.145039 | -0.522456 | -0.072593 | -0.076135 |
| 660 | 1.352528 | 0.744788 | 0.306645 | -0.278179 | -0.637753 | 1.235055 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | -0.602821 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.763689 | -0.522456 | -0.072593 | -0.076135 |
| 1357 | -2.088332 | -0.241716 | 1.171199 | -0.278179 | -0.637753 | 3.284263 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 0.062974 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.315480 | -0.522456 | -0.072593 | -0.076135 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 335 | -0.983858 | 1.300929 | 2.757556 | -0.278179 | -0.637753 | -0.627861 | -0.125004 | 2.421678 | -0.243976 | -0.186131 | ... | -0.665205 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.805686 | -0.522456 | -0.072593 | -0.076135 |
| 2151 | 1.479967 | 3.690021 | 1.828392 | -0.278179 | -0.637753 | -0.627861 | -0.125004 | -0.395024 | 5.645407 | -0.186131 | ... | 0.985120 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | 0.305303 | -0.522456 | -0.072593 | -0.076135 |
| 537 | 0.120615 | 1.294130 | 0.945757 | -0.278179 | -0.637753 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | -0.054266 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.394405 | -0.522456 | -0.072593 | -0.076135 |
| 650 | -0.431622 | 2.557344 | 3.780529 | -0.278179 | -0.637753 | -0.306085 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | 1.563433 | -0.524372 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | 0.694620 | -0.522456 | -0.072593 | -0.076135 |
| 569 | 0.502933 | -0.674798 | 0.598703 | -0.278179 | 3.204044 | -0.627861 | -0.125004 | -0.395024 | -0.243976 | -0.186131 | ... | -1.222365 | 1.276164 | -0.124387 | -0.120551 | -0.035756 | -0.091804 | -0.249829 | 0.628242 | -0.072593 | -0.076135 |
582 rows × 26 columns
round(mae(target_train, np.ones(features_train.shape[0])*target_train.mean()), 2)
7.88
Теперь у нас есть точка отсчета MAE=7.88
# Cоздадим объект `KFold`, который будем использовать для разделения данных
kf = KFold(n_splits=5, shuffle=True, random_state=RANDOM_STATE)
# Создаем Pipeline с масштабированием и регрессией
pipeline = make_pipeline(StandardScaler(), regressor)
# Вычисляем значение метрики с помощью кросс-валидации
cv_MAE_LR = round(abs(cross_val_score(pipeline,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
print('Значение метрики:', cv_MAE_LR)
Значение метрики: 6.55 Wall time: 2.68 s
%%time
def objective(trial):
# Определяем параметры для подбора
n_estimators = trial.suggest_int('n_estimators', 10, 500)
max_depth = trial.suggest_int('max_depth', 5, 30)
min_samples_split = trial.suggest_int('min_samples_split', 2, 6)
min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 3)
max_features = trial.suggest_categorical(
'max_features', ['sqrt', 'log2', None])
bootstrap = trial.suggest_categorical('bootstrap', [True, False])
# Создадём модель случайного леса с определенными параметрами
model = RandomForestRegressor(n_estimators=n_estimators,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
max_features=max_features,
bootstrap=bootstrap,
random_state=RANDOM_STATE,
n_jobs=-1)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возврщаем значения метрики
return score.mean()
# Создаём объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаём датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'n_estimators': [],
'max_depth': [],
'min_samples_split': [],
'min_samples_leaf': [],
'max_features': [],
'bootstrap': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаём график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаём график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:15:19,986] A new study created in memory with name: no-name-c5093fcd-01c0-457c-9ffc-6126a90c4d3a
[I 2023-08-24 10:15:27,559] Trial 0 finished with value: 8.16 and parameters: {'n_estimators': 476, 'max_depth': 15, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': False}. Best is trial 0 with value: 8.16.
[I 2023-08-24 10:15:29,151] Trial 1 finished with value: 6.65 and parameters: {'n_estimators': 428, 'max_depth': 14, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'log2', 'bootstrap': True}. Best is trial 1 with value: 6.65.
[I 2023-08-24 10:15:30,342] Trial 2 finished with value: 6.62 and parameters: {'n_estimators': 343, 'max_depth': 22, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'sqrt', 'bootstrap': True}. Best is trial 2 with value: 6.62.
[I 2023-08-24 10:15:30,761] Trial 3 finished with value: 6.7 and parameters: {'n_estimators': 123, 'max_depth': 28, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': 'log2', 'bootstrap': True}. Best is trial 2 with value: 6.62.
[I 2023-08-24 10:15:31,851] Trial 4 finished with value: 6.59 and parameters: {'n_estimators': 259, 'max_depth': 28, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': 'log2', 'bootstrap': False}. Best is trial 4 with value: 6.59.
[I 2023-08-24 10:15:34,369] Trial 5 finished with value: 6.59 and parameters: {'n_estimators': 473, 'max_depth': 30, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'bootstrap': False}. Best is trial 4 with value: 6.59.
[I 2023-08-24 10:15:34,568] Trial 6 finished with value: 6.78 and parameters: {'n_estimators': 31, 'max_depth': 30, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'log2', 'bootstrap': False}. Best is trial 4 with value: 6.59.
[I 2023-08-24 10:15:39,684] Trial 7 finished with value: 6.45 and parameters: {'n_estimators': 451, 'max_depth': 29, 'min_samples_split': 3, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 7 with value: 6.45.
[I 2023-08-24 10:15:40,707] Trial 8 finished with value: 6.77 and parameters: {'n_estimators': 362, 'max_depth': 8, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': 'log2', 'bootstrap': True}. Best is trial 7 with value: 6.45.
[I 2023-08-24 10:15:41,719] Trial 9 finished with value: 6.67 and parameters: {'n_estimators': 301, 'max_depth': 13, 'min_samples_split': 3, 'min_samples_leaf': 1, 'max_features': 'log2', 'bootstrap': True}. Best is trial 7 with value: 6.45.
[I 2023-08-24 10:15:43,612] Trial 10 finished with value: 6.44 and parameters: {'n_estimators': 179, 'max_depth': 22, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:45,237] Trial 11 finished with value: 6.44 and parameters: {'n_estimators': 160, 'max_depth': 23, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:47,012] Trial 12 finished with value: 6.45 and parameters: {'n_estimators': 172, 'max_depth': 22, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:48,750] Trial 13 finished with value: 6.45 and parameters: {'n_estimators': 171, 'max_depth': 21, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:49,611] Trial 14 finished with value: 6.44 and parameters: {'n_estimators': 88, 'max_depth': 19, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:51,790] Trial 15 finished with value: 6.45 and parameters: {'n_estimators': 211, 'max_depth': 25, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:52,044] Trial 16 finished with value: 6.61 and parameters: {'n_estimators': 20, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 10 with value: 6.44.
[I 2023-08-24 10:15:53,041] Trial 17 finished with value: 6.43 and parameters: {'n_estimators': 102, 'max_depth': 18, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:15:53,440] Trial 18 finished with value: 6.62 and parameters: {'n_estimators': 83, 'max_depth': 17, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': 'sqrt', 'bootstrap': False}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:15:55,389] Trial 19 finished with value: 6.45 and parameters: {'n_estimators': 245, 'max_depth': 9, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:15:56,308] Trial 20 finished with value: 6.43 and parameters: {'n_estimators': 102, 'max_depth': 11, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:15:56,847] Trial 21 finished with value: 6.65 and parameters: {'n_estimators': 89, 'max_depth': 5, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:15:57,895] Trial 22 finished with value: 6.44 and parameters: {'n_estimators': 116, 'max_depth': 11, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:15:59,886] Trial 23 finished with value: 6.44 and parameters: {'n_estimators': 205, 'max_depth': 19, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:00,380] Trial 24 finished with value: 6.47 and parameters: {'n_estimators': 46, 'max_depth': 17, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:01,041] Trial 25 finished with value: 6.65 and parameters: {'n_estimators': 143, 'max_depth': 19, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 'sqrt', 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:01,880] Trial 26 finished with value: 8.08 and parameters: {'n_estimators': 61, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': False}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:03,873] Trial 27 finished with value: 6.45 and parameters: {'n_estimators': 198, 'max_depth': 16, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:05,138] Trial 28 finished with value: 6.44 and parameters: {'n_estimators': 118, 'max_depth': 25, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:07,139] Trial 29 finished with value: 7.21 and parameters: {'n_estimators': 236, 'max_depth': 6, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': False}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:07,413] Trial 30 finished with value: 6.7 and parameters: {'n_estimators': 65, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': 'sqrt', 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:08,995] Trial 31 finished with value: 6.45 and parameters: {'n_estimators': 157, 'max_depth': 21, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:10,292] Trial 32 finished with value: 6.44 and parameters: {'n_estimators': 126, 'max_depth': 23, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:12,241] Trial 33 finished with value: 6.45 and parameters: {'n_estimators': 182, 'max_depth': 26, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:13,819] Trial 34 finished with value: 6.45 and parameters: {'n_estimators': 144, 'max_depth': 20, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:16,801] Trial 35 finished with value: 6.44 and parameters: {'n_estimators': 296, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:17,867] Trial 36 finished with value: 6.43 and parameters: {'n_estimators': 107, 'max_depth': 27, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:18,448] Trial 37 finished with value: 6.61 and parameters: {'n_estimators': 108, 'max_depth': 27, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'sqrt', 'bootstrap': False}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:18,693] Trial 38 finished with value: 6.74 and parameters: {'n_estimators': 57, 'max_depth': 11, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': 'log2', 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:19,647] Trial 39 finished with value: 6.45 and parameters: {'n_estimators': 86, 'max_depth': 28, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:19,785] Trial 40 finished with value: 6.74 and parameters: {'n_estimators': 11, 'max_depth': 13, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'log2', 'bootstrap': False}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:21,464] Trial 41 finished with value: 6.45 and parameters: {'n_estimators': 131, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:23,794] Trial 42 finished with value: 6.44 and parameters: {'n_estimators': 225, 'max_depth': 22, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:26,847] Trial 43 finished with value: 6.44 and parameters: {'n_estimators': 271, 'max_depth': 26, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:28,553] Trial 44 finished with value: 6.44 and parameters: {'n_estimators': 156, 'max_depth': 20, 'min_samples_split': 3, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:30,628] Trial 45 finished with value: 6.46 and parameters: {'n_estimators': 185, 'max_depth': 23, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:31,083] Trial 46 finished with value: 6.74 and parameters: {'n_estimators': 104, 'max_depth': 18, 'min_samples_split': 2, 'min_samples_leaf': 3, 'max_features': 'log2', 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:35,495] Trial 47 finished with value: 6.44 and parameters: {'n_estimators': 422, 'max_depth': 30, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:36,133] Trial 48 finished with value: 6.65 and parameters: {'n_estimators': 168, 'max_depth': 21, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': 'sqrt', 'bootstrap': True}. Best is trial 17 with value: 6.43.
[I 2023-08-24 10:16:36,534] Trial 49 finished with value: 6.57 and parameters: {'n_estimators': 32, 'max_depth': 28, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 17 with value: 6.43.
Лучшие параметры: {'n_estimators': 102, 'max_depth': 18, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}
Лучшее значение метрики: 6.43
| n_estimators | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | score | |
|---|---|---|---|---|---|---|---|
| 0 | 102.0 | 18.0 | 6.0 | 3.0 | None | 1.0 | 6.43 |
| 1 | 102.0 | 11.0 | 5.0 | 3.0 | None | 1.0 | 6.43 |
| 2 | 107.0 | 27.0 | 5.0 | 3.0 | None | 1.0 | 6.43 |
<timed exec>:57: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:73: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 1min 24s
# Создаём график контура
fig = optuna.visualization.plot_contour(
study, params=['max_depth', 'n_estimators'])
fig.show()
Подредактируем интервалы параметров для подбора.
%%time
def objective(trial):
# Определяем параметры для подбора
n_estimators = trial.suggest_int('n_estimators', 14, 450)
max_depth = trial.suggest_int('max_depth', 10, 26)
min_samples_split = trial.suggest_int('min_samples_split', 3, 6)
min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 3)
max_features = trial.suggest_categorical('max_features', [None])
bootstrap = trial.suggest_categorical('bootstrap', [True])
# Создадём модель случайного леса с определенными параметрами
model = RandomForestRegressor(n_estimators=n_estimators,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
max_features=max_features,
bootstrap=bootstrap,
random_state=RANDOM_STATE,
n_jobs=-1)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возврщаем значения метрики
return score.mean()
# Создаём объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаём датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'n_estimators': [],
'max_depth': [],
'min_samples_split': [],
'min_samples_leaf': [],
'max_features': [],
'bootstrap': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаём график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаём график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:16:44,629] A new study created in memory with name: no-name-e5ecc5e4-5c56-477e-b241-b6f1db4aba45
[I 2023-08-24 10:16:47,215] Trial 0 finished with value: 6.46 and parameters: {'n_estimators': 211, 'max_depth': 25, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.46.
[I 2023-08-24 10:16:50,149] Trial 1 finished with value: 6.45 and parameters: {'n_estimators': 262, 'max_depth': 16, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 1 with value: 6.45.
[I 2023-08-24 10:16:50,958] Trial 2 finished with value: 6.44 and parameters: {'n_estimators': 84, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:16:52,729] Trial 3 finished with value: 6.46 and parameters: {'n_estimators': 157, 'max_depth': 18, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:16:55,803] Trial 4 finished with value: 6.44 and parameters: {'n_estimators': 315, 'max_depth': 17, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:16:59,289] Trial 5 finished with value: 6.45 and parameters: {'n_estimators': 355, 'max_depth': 23, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:01,185] Trial 6 finished with value: 6.44 and parameters: {'n_estimators': 190, 'max_depth': 18, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:02,535] Trial 7 finished with value: 6.45 and parameters: {'n_estimators': 148, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:03,386] Trial 8 finished with value: 6.44 and parameters: {'n_estimators': 81, 'max_depth': 21, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:04,882] Trial 9 finished with value: 6.44 and parameters: {'n_estimators': 157, 'max_depth': 18, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:05,077] Trial 10 finished with value: 6.57 and parameters: {'n_estimators': 16, 'max_depth': 10, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:09,008] Trial 11 finished with value: 6.44 and parameters: {'n_estimators': 431, 'max_depth': 14, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:11,792] Trial 12 finished with value: 6.44 and parameters: {'n_estimators': 312, 'max_depth': 12, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:15,241] Trial 13 finished with value: 6.44 and parameters: {'n_estimators': 363, 'max_depth': 15, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:18,133] Trial 14 finished with value: 6.44 and parameters: {'n_estimators': 272, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:18,904] Trial 15 finished with value: 6.44 and parameters: {'n_estimators': 80, 'max_depth': 10, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:23,544] Trial 16 finished with value: 6.44 and parameters: {'n_estimators': 431, 'max_depth': 16, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:23,784] Trial 17 finished with value: 6.58 and parameters: {'n_estimators': 15, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 2 with value: 6.44.
[I 2023-08-24 10:17:24,654] Trial 18 finished with value: 6.43 and parameters: {'n_estimators': 85, 'max_depth': 12, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:25,707] Trial 19 finished with value: 6.44 and parameters: {'n_estimators': 98, 'max_depth': 12, 'min_samples_split': 3, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:26,849] Trial 20 finished with value: 6.43 and parameters: {'n_estimators': 111, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:27,868] Trial 21 finished with value: 6.43 and parameters: {'n_estimators': 106, 'max_depth': 11, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:28,983] Trial 22 finished with value: 6.44 and parameters: {'n_estimators': 118, 'max_depth': 11, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:29,514] Trial 23 finished with value: 6.46 and parameters: {'n_estimators': 53, 'max_depth': 13, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:30,652] Trial 24 finished with value: 6.44 and parameters: {'n_estimators': 128, 'max_depth': 11, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:31,211] Trial 25 finished with value: 6.43 and parameters: {'n_estimators': 54, 'max_depth': 12, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:32,814] Trial 26 finished with value: 6.44 and parameters: {'n_estimators': 198, 'max_depth': 10, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:33,330] Trial 27 finished with value: 6.45 and parameters: {'n_estimators': 50, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:34,855] Trial 28 finished with value: 6.44 and parameters: {'n_estimators': 175, 'max_depth': 11, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:37,167] Trial 29 finished with value: 6.45 and parameters: {'n_estimators': 233, 'max_depth': 13, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:38,364] Trial 30 finished with value: 6.45 and parameters: {'n_estimators': 122, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:38,922] Trial 31 finished with value: 6.48 and parameters: {'n_estimators': 48, 'max_depth': 26, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:39,448] Trial 32 finished with value: 6.43 and parameters: {'n_estimators': 54, 'max_depth': 12, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:40,469] Trial 33 finished with value: 6.45 and parameters: {'n_estimators': 100, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:41,941] Trial 34 finished with value: 6.45 and parameters: {'n_estimators': 138, 'max_depth': 15, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:42,622] Trial 35 finished with value: 6.45 and parameters: {'n_estimators': 66, 'max_depth': 11, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:43,719] Trial 36 finished with value: 6.47 and parameters: {'n_estimators': 104, 'max_depth': 13, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:44,099] Trial 37 finished with value: 6.51 and parameters: {'n_estimators': 33, 'max_depth': 16, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:45,944] Trial 38 finished with value: 6.44 and parameters: {'n_estimators': 222, 'max_depth': 10, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:47,793] Trial 39 finished with value: 6.45 and parameters: {'n_estimators': 181, 'max_depth': 14, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:48,662] Trial 40 finished with value: 6.45 and parameters: {'n_estimators': 82, 'max_depth': 23, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:49,359] Trial 41 finished with value: 6.44 and parameters: {'n_estimators': 67, 'max_depth': 12, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:49,825] Trial 42 finished with value: 6.49 and parameters: {'n_estimators': 40, 'max_depth': 12, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:51,361] Trial 43 finished with value: 6.45 and parameters: {'n_estimators': 160, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:52,425] Trial 44 finished with value: 6.45 and parameters: {'n_estimators': 110, 'max_depth': 11, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:52,740] Trial 45 finished with value: 6.55 and parameters: {'n_estimators': 27, 'max_depth': 19, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:53,440] Trial 46 finished with value: 6.45 and parameters: {'n_estimators': 71, 'max_depth': 13, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:54,925] Trial 47 finished with value: 6.47 and parameters: {'n_estimators': 143, 'max_depth': 14, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:55,824] Trial 48 finished with value: 6.43 and parameters: {'n_estimators': 92, 'max_depth': 17, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
[I 2023-08-24 10:17:56,391] Trial 49 finished with value: 6.44 and parameters: {'n_estimators': 60, 'max_depth': 10, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 18 with value: 6.43.
Лучшие параметры: {'n_estimators': 85, 'max_depth': 12, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}
Лучшее значение метрики: 6.43
| n_estimators | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | score | |
|---|---|---|---|---|---|---|---|
| 0 | 85.0 | 12.0 | 4.0 | 3.0 | None | 1.0 | 6.43 |
| 1 | 111.0 | 12.0 | 6.0 | 3.0 | None | 1.0 | 6.43 |
| 2 | 106.0 | 11.0 | 6.0 | 3.0 | None | 1.0 | 6.43 |
| 3 | 54.0 | 12.0 | 5.0 | 2.0 | None | 1.0 | 6.43 |
| 4 | 54.0 | 12.0 | 5.0 | 2.0 | None | 1.0 | 6.43 |
| 5 | 92.0 | 17.0 | 3.0 | 3.0 | None | 1.0 | 6.43 |
<timed exec>:56: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:72: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 1min 13s
# Создаём график контура
fig = optuna.visualization.plot_contour(
study, params=["max_depth", "n_estimators"])
fig.show()
Делаем окончательный подбор параметров модели.
%%time
def objective(trial):
# Определяем параметры для подбора
n_estimators = trial.suggest_int('n_estimators', 150, 355)
max_depth = trial.suggest_int('max_depth', 12, 26)
min_samples_split = trial.suggest_int('min_samples_split', 3, 6)
min_samples_leaf = trial.suggest_int('min_samples_leaf', 2, 3)
max_features = trial.suggest_categorical('max_features', [None])
bootstrap = trial.suggest_categorical('bootstrap', [True])
# Создадём модель случайного леса с определенными параметрами
model = RandomForestRegressor(n_estimators=n_estimators,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
max_features=max_features,
bootstrap=bootstrap,
random_state=RANDOM_STATE,
n_jobs=-1)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возврщаем значения метрики
return score.mean()
# Создаём объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=200)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаём датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'n_estimators': [],
'max_depth': [],
'min_samples_split': [],
'min_samples_leaf': [],
'max_features': [],
'bootstrap': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаём график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаём график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
# Сохранение модели с наилучшими параметрами
forest = RandomForestRegressor(n_estimators=study.best_params['n_estimators'],
max_depth=study.best_params['max_depth'],
min_samples_split=study.best_params['min_samples_split'],
min_samples_leaf=study.best_params['min_samples_leaf'],
max_features=study.best_params['max_features'],
bootstrap=study.best_params['bootstrap'],
random_state=RANDOM_STATE, n_jobs=-1)
[I 2023-08-24 10:17:58,391] A new study created in memory with name: no-name-9db2dd26-faab-49fa-8f6d-24d21c2b734b
[I 2023-08-24 10:18:01,598] Trial 0 finished with value: 6.44 and parameters: {'n_estimators': 319, 'max_depth': 25, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:03,801] Trial 1 finished with value: 6.44 and parameters: {'n_estimators': 233, 'max_depth': 21, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:05,271] Trial 2 finished with value: 6.44 and parameters: {'n_estimators': 158, 'max_depth': 20, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:07,962] Trial 3 finished with value: 6.44 and parameters: {'n_estimators': 281, 'max_depth': 17, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:10,470] Trial 4 finished with value: 6.44 and parameters: {'n_estimators': 249, 'max_depth': 26, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:13,373] Trial 5 finished with value: 6.44 and parameters: {'n_estimators': 315, 'max_depth': 26, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:16,190] Trial 6 finished with value: 6.44 and parameters: {'n_estimators': 312, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:18,435] Trial 7 finished with value: 6.45 and parameters: {'n_estimators': 199, 'max_depth': 20, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:21,021] Trial 8 finished with value: 6.44 and parameters: {'n_estimators': 243, 'max_depth': 22, 'min_samples_split': 3, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:23,629] Trial 9 finished with value: 6.44 and parameters: {'n_estimators': 279, 'max_depth': 13, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:26,932] Trial 10 finished with value: 6.45 and parameters: {'n_estimators': 355, 'max_depth': 24, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:28,911] Trial 11 finished with value: 6.44 and parameters: {'n_estimators': 205, 'max_depth': 17, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:31,063] Trial 12 finished with value: 6.45 and parameters: {'n_estimators': 211, 'max_depth': 23, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:34,484] Trial 13 finished with value: 6.44 and parameters: {'n_estimators': 355, 'max_depth': 22, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:37,155] Trial 14 finished with value: 6.44 and parameters: {'n_estimators': 284, 'max_depth': 17, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:39,344] Trial 15 finished with value: 6.45 and parameters: {'n_estimators': 222, 'max_depth': 24, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:41,374] Trial 16 finished with value: 6.44 and parameters: {'n_estimators': 177, 'max_depth': 19, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:44,741] Trial 17 finished with value: 6.44 and parameters: {'n_estimators': 312, 'max_depth': 25, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:47,420] Trial 18 finished with value: 6.44 and parameters: {'n_estimators': 265, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:49,782] Trial 19 finished with value: 6.44 and parameters: {'n_estimators': 233, 'max_depth': 15, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:53,228] Trial 20 finished with value: 6.45 and parameters: {'n_estimators': 337, 'max_depth': 19, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:54,782] Trial 21 finished with value: 6.44 and parameters: {'n_estimators': 158, 'max_depth': 20, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:56,328] Trial 22 finished with value: 6.45 and parameters: {'n_estimators': 150, 'max_depth': 22, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:58,091] Trial 23 finished with value: 6.44 and parameters: {'n_estimators': 185, 'max_depth': 18, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:18:59,830] Trial 24 finished with value: 6.44 and parameters: {'n_estimators': 176, 'max_depth': 24, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:02,327] Trial 25 finished with value: 6.44 and parameters: {'n_estimators': 259, 'max_depth': 15, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:04,556] Trial 26 finished with value: 6.44 and parameters: {'n_estimators': 225, 'max_depth': 20, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:07,367] Trial 27 finished with value: 6.44 and parameters: {'n_estimators': 299, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:10,516] Trial 28 finished with value: 6.45 and parameters: {'n_estimators': 334, 'max_depth': 23, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:13,263] Trial 29 finished with value: 6.44 and parameters: {'n_estimators': 282, 'max_depth': 18, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:15,170] Trial 30 finished with value: 6.44 and parameters: {'n_estimators': 195, 'max_depth': 15, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:17,903] Trial 31 finished with value: 6.44 and parameters: {'n_estimators': 267, 'max_depth': 16, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:21,041] Trial 32 finished with value: 6.44 and parameters: {'n_estimators': 303, 'max_depth': 26, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:23,572] Trial 33 finished with value: 6.44 and parameters: {'n_estimators': 246, 'max_depth': 18, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:26,956] Trial 34 finished with value: 6.45 and parameters: {'n_estimators': 325, 'max_depth': 19, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:29,828] Trial 35 finished with value: 6.44 and parameters: {'n_estimators': 303, 'max_depth': 26, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:32,834] Trial 36 finished with value: 6.44 and parameters: {'n_estimators': 295, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:35,736] Trial 37 finished with value: 6.44 and parameters: {'n_estimators': 325, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:38,554] Trial 38 finished with value: 6.44 and parameters: {'n_estimators': 273, 'max_depth': 17, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:40,971] Trial 39 finished with value: 6.44 and parameters: {'n_estimators': 252, 'max_depth': 23, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:43,248] Trial 40 finished with value: 6.45 and parameters: {'n_estimators': 239, 'max_depth': 25, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:45,479] Trial 41 finished with value: 6.44 and parameters: {'n_estimators': 216, 'max_depth': 25, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:48,374] Trial 42 finished with value: 6.45 and parameters: {'n_estimators': 286, 'max_depth': 26, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:51,001] Trial 43 finished with value: 6.44 and parameters: {'n_estimators': 255, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:53,340] Trial 44 finished with value: 6.44 and parameters: {'n_estimators': 232, 'max_depth': 22, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:56,534] Trial 45 finished with value: 6.44 and parameters: {'n_estimators': 345, 'max_depth': 16, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:19:58,612] Trial 46 finished with value: 6.45 and parameters: {'n_estimators': 201, 'max_depth': 25, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:01,297] Trial 47 finished with value: 6.44 and parameters: {'n_estimators': 275, 'max_depth': 20, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:04,148] Trial 48 finished with value: 6.44 and parameters: {'n_estimators': 292, 'max_depth': 22, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:07,163] Trial 49 finished with value: 6.44 and parameters: {'n_estimators': 314, 'max_depth': 23, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:09,450] Trial 50 finished with value: 6.45 and parameters: {'n_estimators': 245, 'max_depth': 19, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:12,449] Trial 51 finished with value: 6.44 and parameters: {'n_estimators': 318, 'max_depth': 26, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:15,654] Trial 52 finished with value: 6.45 and parameters: {'n_estimators': 344, 'max_depth': 25, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:18,148] Trial 53 finished with value: 6.44 and parameters: {'n_estimators': 262, 'max_depth': 24, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:21,129] Trial 54 finished with value: 6.44 and parameters: {'n_estimators': 307, 'max_depth': 26, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:22,732] Trial 55 finished with value: 6.44 and parameters: {'n_estimators': 161, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:24,726] Trial 56 finished with value: 6.45 and parameters: {'n_estimators': 210, 'max_depth': 24, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:27,848] Trial 57 finished with value: 6.44 and parameters: {'n_estimators': 323, 'max_depth': 16, 'min_samples_split': 6, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:30,557] Trial 58 finished with value: 6.44 and parameters: {'n_estimators': 290, 'max_depth': 25, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:32,171] Trial 59 finished with value: 6.44 and parameters: {'n_estimators': 173, 'max_depth': 13, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:35,282] Trial 60 finished with value: 6.44 and parameters: {'n_estimators': 332, 'max_depth': 18, 'min_samples_split': 5, 'min_samples_leaf': 3, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:38,121] Trial 61 finished with value: 6.44 and parameters: {'n_estimators': 311, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 0 with value: 6.44.
[I 2023-08-24 10:20:40,696] Trial 62 finished with value: 6.43 and parameters: {'n_estimators': 272, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:43,504] Trial 63 finished with value: 6.43 and parameters: {'n_estimators': 268, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:46,204] Trial 64 finished with value: 6.43 and parameters: {'n_estimators': 273, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:48,715] Trial 65 finished with value: 6.44 and parameters: {'n_estimators': 270, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:51,406] Trial 66 finished with value: 6.43 and parameters: {'n_estimators': 278, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:54,169] Trial 67 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:56,912] Trial 68 finished with value: 6.43 and parameters: {'n_estimators': 275, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:20:59,614] Trial 69 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:02,273] Trial 70 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:05,048] Trial 71 finished with value: 6.43 and parameters: {'n_estimators': 278, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:07,584] Trial 72 finished with value: 6.43 and parameters: {'n_estimators': 258, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:10,131] Trial 73 finished with value: 6.44 and parameters: {'n_estimators': 267, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:12,934] Trial 74 finished with value: 6.44 and parameters: {'n_estimators': 284, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:15,765] Trial 75 finished with value: 6.44 and parameters: {'n_estimators': 276, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:18,428] Trial 76 finished with value: 6.44 and parameters: {'n_estimators': 289, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:20,963] Trial 77 finished with value: 6.44 and parameters: {'n_estimators': 263, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:24,024] Trial 78 finished with value: 6.44 and parameters: {'n_estimators': 296, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:26,865] Trial 79 finished with value: 6.44 and parameters: {'n_estimators': 281, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:29,660] Trial 80 finished with value: 6.44 and parameters: {'n_estimators': 271, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:32,396] Trial 81 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:35,096] Trial 82 finished with value: 6.43 and parameters: {'n_estimators': 280, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:37,498] Trial 83 finished with value: 6.44 and parameters: {'n_estimators': 250, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:39,956] Trial 84 finished with value: 6.44 and parameters: {'n_estimators': 269, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:42,405] Trial 85 finished with value: 6.43 and parameters: {'n_estimators': 256, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:45,268] Trial 86 finished with value: 6.44 and parameters: {'n_estimators': 287, 'max_depth': 16, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:47,824] Trial 87 finished with value: 6.44 and parameters: {'n_estimators': 263, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:50,586] Trial 88 finished with value: 6.44 and parameters: {'n_estimators': 300, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:53,123] Trial 89 finished with value: 6.44 and parameters: {'n_estimators': 273, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:55,809] Trial 90 finished with value: 6.43 and parameters: {'n_estimators': 279, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:21:58,263] Trial 91 finished with value: 6.43 and parameters: {'n_estimators': 256, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:00,742] Trial 92 finished with value: 6.43 and parameters: {'n_estimators': 260, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:03,020] Trial 93 finished with value: 6.44 and parameters: {'n_estimators': 241, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:05,832] Trial 94 finished with value: 6.43 and parameters: {'n_estimators': 293, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:08,552] Trial 95 finished with value: 6.44 and parameters: {'n_estimators': 266, 'max_depth': 16, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:11,296] Trial 96 finished with value: 6.44 and parameters: {'n_estimators': 284, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:13,892] Trial 97 finished with value: 6.44 and parameters: {'n_estimators': 276, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:16,365] Trial 98 finished with value: 6.44 and parameters: {'n_estimators': 259, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:19,016] Trial 99 finished with value: 6.44 and parameters: {'n_estimators': 273, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:21,369] Trial 100 finished with value: 6.44 and parameters: {'n_estimators': 252, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:23,995] Trial 101 finished with value: 6.43 and parameters: {'n_estimators': 276, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:26,664] Trial 102 finished with value: 6.43 and parameters: {'n_estimators': 279, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:29,229] Trial 103 finished with value: 6.43 and parameters: {'n_estimators': 268, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:31,912] Trial 104 finished with value: 6.44 and parameters: {'n_estimators': 289, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:34,585] Trial 105 finished with value: 6.44 and parameters: {'n_estimators': 283, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:37,133] Trial 106 finished with value: 6.44 and parameters: {'n_estimators': 265, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:39,779] Trial 107 finished with value: 6.43 and parameters: {'n_estimators': 278, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:42,440] Trial 108 finished with value: 6.44 and parameters: {'n_estimators': 273, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:45,208] Trial 109 finished with value: 6.44 and parameters: {'n_estimators': 297, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:47,546] Trial 110 finished with value: 6.44 and parameters: {'n_estimators': 237, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:50,246] Trial 111 finished with value: 6.44 and parameters: {'n_estimators': 281, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:52,904] Trial 112 finished with value: 6.43 and parameters: {'n_estimators': 269, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:55,994] Trial 113 finished with value: 6.44 and parameters: {'n_estimators': 285, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:22:58,551] Trial 114 finished with value: 6.44 and parameters: {'n_estimators': 260, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:01,532] Trial 115 finished with value: 6.44 and parameters: {'n_estimators': 293, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:03,924] Trial 116 finished with value: 6.43 and parameters: {'n_estimators': 248, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:06,669] Trial 117 finished with value: 6.44 and parameters: {'n_estimators': 279, 'max_depth': 16, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:09,351] Trial 118 finished with value: 6.44 and parameters: {'n_estimators': 287, 'max_depth': 12, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:11,901] Trial 119 finished with value: 6.44 and parameters: {'n_estimators': 270, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:14,341] Trial 120 finished with value: 6.43 and parameters: {'n_estimators': 255, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:16,800] Trial 121 finished with value: 6.43 and parameters: {'n_estimators': 256, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:19,354] Trial 122 finished with value: 6.44 and parameters: {'n_estimators': 266, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:22,152] Trial 123 finished with value: 6.44 and parameters: {'n_estimators': 275, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:24,607] Trial 124 finished with value: 6.44 and parameters: {'n_estimators': 263, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:27,310] Trial 125 finished with value: 6.43 and parameters: {'n_estimators': 273, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:30,105] Trial 126 finished with value: 6.44 and parameters: {'n_estimators': 283, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:32,813] Trial 127 finished with value: 6.44 and parameters: {'n_estimators': 279, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:35,244] Trial 128 finished with value: 6.44 and parameters: {'n_estimators': 257, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:37,873] Trial 129 finished with value: 6.43 and parameters: {'n_estimators': 270, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:40,443] Trial 130 finished with value: 6.44 and parameters: {'n_estimators': 262, 'max_depth': 16, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:43,111] Trial 131 finished with value: 6.43 and parameters: {'n_estimators': 278, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:45,896] Trial 132 finished with value: 6.44 and parameters: {'n_estimators': 281, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:48,757] Trial 133 finished with value: 6.44 and parameters: {'n_estimators': 290, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:51,558] Trial 134 finished with value: 6.44 and parameters: {'n_estimators': 274, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:54,265] Trial 135 finished with value: 6.44 and parameters: {'n_estimators': 266, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:57,170] Trial 136 finished with value: 6.44 and parameters: {'n_estimators': 285, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:23:59,846] Trial 137 finished with value: 6.43 and parameters: {'n_estimators': 252, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:02,687] Trial 138 finished with value: 6.44 and parameters: {'n_estimators': 303, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:05,363] Trial 139 finished with value: 6.44 and parameters: {'n_estimators': 271, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:08,100] Trial 140 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:10,503] Trial 141 finished with value: 6.43 and parameters: {'n_estimators': 245, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:13,053] Trial 142 finished with value: 6.43 and parameters: {'n_estimators': 257, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:15,612] Trial 143 finished with value: 6.44 and parameters: {'n_estimators': 264, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:18,342] Trial 144 finished with value: 6.43 and parameters: {'n_estimators': 282, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:20,954] Trial 145 finished with value: 6.44 and parameters: {'n_estimators': 269, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:23,559] Trial 146 finished with value: 6.44 and parameters: {'n_estimators': 276, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:26,029] Trial 147 finished with value: 6.43 and parameters: {'n_estimators': 259, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:28,748] Trial 148 finished with value: 6.43 and parameters: {'n_estimators': 287, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:31,233] Trial 149 finished with value: 6.44 and parameters: {'n_estimators': 249, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:33,794] Trial 150 finished with value: 6.44 and parameters: {'n_estimators': 273, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:36,356] Trial 151 finished with value: 6.43 and parameters: {'n_estimators': 260, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:38,960] Trial 152 finished with value: 6.44 and parameters: {'n_estimators': 266, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:41,472] Trial 153 finished with value: 6.43 and parameters: {'n_estimators': 254, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:44,220] Trial 154 finished with value: 6.43 and parameters: {'n_estimators': 280, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:46,850] Trial 155 finished with value: 6.44 and parameters: {'n_estimators': 260, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:49,504] Trial 156 finished with value: 6.44 and parameters: {'n_estimators': 269, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:52,191] Trial 157 finished with value: 6.43 and parameters: {'n_estimators': 275, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:54,998] Trial 158 finished with value: 6.44 and parameters: {'n_estimators': 290, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:24:57,586] Trial 159 finished with value: 6.44 and parameters: {'n_estimators': 264, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:00,288] Trial 160 finished with value: 6.43 and parameters: {'n_estimators': 280, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:03,061] Trial 161 finished with value: 6.43 and parameters: {'n_estimators': 286, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:05,902] Trial 162 finished with value: 6.44 and parameters: {'n_estimators': 296, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:08,700] Trial 163 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:11,319] Trial 164 finished with value: 6.43 and parameters: {'n_estimators': 272, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:14,151] Trial 165 finished with value: 6.44 and parameters: {'n_estimators': 284, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:16,990] Trial 166 finished with value: 6.43 and parameters: {'n_estimators': 291, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:19,519] Trial 167 finished with value: 6.44 and parameters: {'n_estimators': 268, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:21,978] Trial 168 finished with value: 6.44 and parameters: {'n_estimators': 253, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:24,495] Trial 169 finished with value: 6.44 and parameters: {'n_estimators': 261, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:27,002] Trial 170 finished with value: 6.44 and parameters: {'n_estimators': 271, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:29,778] Trial 171 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:32,653] Trial 172 finished with value: 6.43 and parameters: {'n_estimators': 293, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:35,409] Trial 173 finished with value: 6.43 and parameters: {'n_estimators': 275, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:38,239] Trial 174 finished with value: 6.44 and parameters: {'n_estimators': 281, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:40,820] Trial 175 finished with value: 6.43 and parameters: {'n_estimators': 268, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:43,371] Trial 176 finished with value: 6.44 and parameters: {'n_estimators': 264, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:46,136] Trial 177 finished with value: 6.44 and parameters: {'n_estimators': 283, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:48,732] Trial 178 finished with value: 6.43 and parameters: {'n_estimators': 274, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:51,492] Trial 179 finished with value: 6.44 and parameters: {'n_estimators': 277, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:54,662] Trial 180 finished with value: 6.44 and parameters: {'n_estimators': 307, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:25:57,507] Trial 181 finished with value: 6.43 and parameters: {'n_estimators': 279, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:00,159] Trial 182 finished with value: 6.43 and parameters: {'n_estimators': 272, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:02,951] Trial 183 finished with value: 6.43 and parameters: {'n_estimators': 287, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:05,648] Trial 184 finished with value: 6.44 and parameters: {'n_estimators': 280, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:08,267] Trial 185 finished with value: 6.43 and parameters: {'n_estimators': 256, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:10,906] Trial 186 finished with value: 6.44 and parameters: {'n_estimators': 266, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:13,784] Trial 187 finished with value: 6.44 and parameters: {'n_estimators': 284, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:16,438] Trial 188 finished with value: 6.44 and parameters: {'n_estimators': 271, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:19,182] Trial 189 finished with value: 6.43 and parameters: {'n_estimators': 277, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:21,690] Trial 190 finished with value: 6.44 and parameters: {'n_estimators': 249, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:24,366] Trial 191 finished with value: 6.43 and parameters: {'n_estimators': 268, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:26,780] Trial 192 finished with value: 6.44 and parameters: {'n_estimators': 261, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:29,617] Trial 193 finished with value: 6.43 and parameters: {'n_estimators': 274, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:32,330] Trial 194 finished with value: 6.43 and parameters: {'n_estimators': 279, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:34,891] Trial 195 finished with value: 6.44 and parameters: {'n_estimators': 267, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:37,356] Trial 196 finished with value: 6.44 and parameters: {'n_estimators': 258, 'max_depth': 13, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:39,992] Trial 197 finished with value: 6.43 and parameters: {'n_estimators': 273, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:42,650] Trial 198 finished with value: 6.44 and parameters: {'n_estimators': 263, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
[I 2023-08-24 10:26:45,488] Trial 199 finished with value: 6.44 and parameters: {'n_estimators': 283, 'max_depth': 15, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}. Best is trial 62 with value: 6.43.
Лучшие параметры: {'n_estimators': 272, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': None, 'bootstrap': True}
Лучшее значение метрики: 6.43
| n_estimators | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | score | |
|---|---|---|---|---|---|---|---|
| 0 | 272.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 43 | 291.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 31 | 257.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 32 | 282.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 33 | 259.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 34 | 287.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 35 | 260.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 36 | 254.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 37 | 280.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
| 38 | 275.0 | 14.0 | 6.0 | 2.0 | None | 1.0 | 6.43 |
<timed exec>:56: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:72: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 8min 49s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["max_depth", "n_estimators"])
fig.show()
forest
RandomForestRegressor(max_depth=14, max_features=None, min_samples_leaf=2,
min_samples_split=6, n_estimators=272, n_jobs=-1,
random_state=140823)
%%time
def objective(trial):
# Определяем параметры для подбора
n_estimators = trial.suggest_int('n_estimators', 10, 500)
max_depth = trial.suggest_int('max_depth', 2, 30)
min_child_samples = trial.suggest_int('min_child_samples', 1, 8)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.1, step=0.01)
subsample = trial.suggest_float('subsample', 0.5, 1.0, step=0.01)
colsample_bytree = trial.suggest_float('colsample_bytree', 0.5, 1.0, step=0.01)
# Создаем модель с определенными параметрами
model = LGBMRegressor(n_estimators=n_estimators,
max_depth=max_depth,
min_child_samples=min_child_samples,
learning_rate=learning_rate,
subsample=subsample,
colsample_bytree=colsample_bytree,
random_state=RANDOM_STATE)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращаем значения метрики
return score.mean()
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаем датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'n_estimators': [],
'max_depth': [],
'min_child_samples': [],
'learning_rate': [],
'subsample': [],
'colsample_bytree': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаем график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаем график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:26:48,440] A new study created in memory with name: no-name-7dc85ea7-653b-4253-aba0-5241b419d609
[I 2023-08-24 10:26:49,709] Trial 0 finished with value: 6.49 and parameters: {'n_estimators': 494, 'max_depth': 20, 'min_child_samples': 7, 'learning_rate': 0.060000000000000005, 'subsample': 0.8200000000000001, 'colsample_bytree': 0.75}. Best is trial 0 with value: 6.49.
[I 2023-08-24 10:26:50,844] Trial 1 finished with value: 6.53 and parameters: {'n_estimators': 450, 'max_depth': 15, 'min_child_samples': 3, 'learning_rate': 0.04, 'subsample': 0.9299999999999999, 'colsample_bytree': 0.8200000000000001}. Best is trial 0 with value: 6.49.
[I 2023-08-24 10:26:51,021] Trial 2 finished with value: 6.76 and parameters: {'n_estimators': 58, 'max_depth': 9, 'min_child_samples': 7, 'learning_rate': 0.02, 'subsample': 0.5, 'colsample_bytree': 0.86}. Best is trial 0 with value: 6.49.
[I 2023-08-24 10:26:52,065] Trial 3 finished with value: 6.55 and parameters: {'n_estimators': 448, 'max_depth': 7, 'min_child_samples': 2, 'learning_rate': 0.060000000000000005, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.9299999999999999}. Best is trial 0 with value: 6.49.
[I 2023-08-24 10:26:52,184] Trial 4 finished with value: 6.33 and parameters: {'n_estimators': 180, 'max_depth': 3, 'min_child_samples': 1, 'learning_rate': 0.05, 'subsample': 0.58, 'colsample_bytree': 0.51}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:52,642] Trial 5 finished with value: 6.57 and parameters: {'n_estimators': 195, 'max_depth': 11, 'min_child_samples': 5, 'learning_rate': 0.01, 'subsample': 0.5, 'colsample_bytree': 0.73}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:53,907] Trial 6 finished with value: 6.33 and parameters: {'n_estimators': 486, 'max_depth': 9, 'min_child_samples': 4, 'learning_rate': 0.01, 'subsample': 0.63, 'colsample_bytree': 0.81}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:54,167] Trial 7 finished with value: 6.4 and parameters: {'n_estimators': 89, 'max_depth': 17, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.8300000000000001, 'colsample_bytree': 0.91}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:54,931] Trial 8 finished with value: 6.36 and parameters: {'n_estimators': 301, 'max_depth': 25, 'min_child_samples': 7, 'learning_rate': 0.03, 'subsample': 0.9199999999999999, 'colsample_bytree': 0.9}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:55,032] Trial 9 finished with value: 7.1 and parameters: {'n_estimators': 167, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.01, 'subsample': 0.87, 'colsample_bytree': 0.87}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:55,220] Trial 10 finished with value: 6.34 and parameters: {'n_estimators': 309, 'max_depth': 2, 'min_child_samples': 1, 'learning_rate': 0.09999999999999999, 'subsample': 0.63, 'colsample_bytree': 0.51}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:55,944] Trial 11 finished with value: 6.4 and parameters: {'n_estimators': 367, 'max_depth': 7, 'min_child_samples': 4, 'learning_rate': 0.04, 'subsample': 0.62, 'colsample_bytree': 0.5700000000000001}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:56,434] Trial 12 finished with value: 6.52 and parameters: {'n_estimators': 200, 'max_depth': 30, 'min_child_samples': 1, 'learning_rate': 0.08, 'subsample': 0.7, 'colsample_bytree': 0.64}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:56,884] Trial 13 finished with value: 6.4 and parameters: {'n_estimators': 138, 'max_depth': 12, 'min_child_samples': 3, 'learning_rate': 0.05, 'subsample': 0.58, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:57,238] Trial 14 finished with value: 6.51 and parameters: {'n_estimators': 276, 'max_depth': 5, 'min_child_samples': 5, 'learning_rate': 0.08, 'subsample': 0.74, 'colsample_bytree': 0.65}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:57,331] Trial 15 finished with value: 7.33 and parameters: {'n_estimators': 14, 'max_depth': 5, 'min_child_samples': 3, 'learning_rate': 0.03, 'subsample': 0.55, 'colsample_bytree': 0.79}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:58,216] Trial 16 finished with value: 6.44 and parameters: {'n_estimators': 374, 'max_depth': 14, 'min_child_samples': 2, 'learning_rate': 0.06999999999999999, 'subsample': 0.64, 'colsample_bytree': 0.69}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:58,656] Trial 17 finished with value: 6.48 and parameters: {'n_estimators': 238, 'max_depth': 19, 'min_child_samples': 4, 'learning_rate': 0.09999999999999999, 'subsample': 0.5700000000000001, 'colsample_bytree': 0.58}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:59,369] Trial 18 finished with value: 6.37 and parameters: {'n_estimators': 380, 'max_depth': 10, 'min_child_samples': 8, 'learning_rate': 0.02, 'subsample': 1.0, 'colsample_bytree': 0.51}. Best is trial 4 with value: 6.33.
[I 2023-08-24 10:26:59,610] Trial 19 finished with value: 6.31 and parameters: {'n_estimators': 119, 'max_depth': 5, 'min_child_samples': 2, 'learning_rate': 0.05, 'subsample': 0.71, 'colsample_bytree': 0.8}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:26:59,775] Trial 20 finished with value: 6.32 and parameters: {'n_estimators': 107, 'max_depth': 4, 'min_child_samples': 2, 'learning_rate': 0.05, 'subsample': 0.77, 'colsample_bytree': 0.7}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:26:59,989] Trial 21 finished with value: 6.36 and parameters: {'n_estimators': 120, 'max_depth': 5, 'min_child_samples': 2, 'learning_rate': 0.05, 'subsample': 0.77, 'colsample_bytree': 0.59}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:00,084] Trial 22 finished with value: 6.87 and parameters: {'n_estimators': 67, 'max_depth': 2, 'min_child_samples': 1, 'learning_rate': 0.04, 'subsample': 0.75, 'colsample_bytree': 0.7}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:00,320] Trial 23 finished with value: 6.34 and parameters: {'n_estimators': 124, 'max_depth': 5, 'min_child_samples': 2, 'learning_rate': 0.060000000000000005, 'subsample': 0.71, 'colsample_bytree': 0.78}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:00,439] Trial 24 finished with value: 6.86 and parameters: {'n_estimators': 21, 'max_depth': 7, 'min_child_samples': 1, 'learning_rate': 0.05, 'subsample': 0.8, 'colsample_bytree': 0.63}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:00,674] Trial 25 finished with value: 6.39 and parameters: {'n_estimators': 229, 'max_depth': 4, 'min_child_samples': 3, 'learning_rate': 0.06999999999999999, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:01,149] Trial 26 finished with value: 6.43 and parameters: {'n_estimators': 161, 'max_depth': 13, 'min_child_samples': 2, 'learning_rate': 0.06999999999999999, 'subsample': 0.78, 'colsample_bytree': 0.97}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:01,400] Trial 27 finished with value: 6.48 and parameters: {'n_estimators': 92, 'max_depth': 8, 'min_child_samples': 1, 'learning_rate': 0.03, 'subsample': 0.8500000000000001, 'colsample_bytree': 0.54}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:01,569] Trial 28 finished with value: 6.38 and parameters: {'n_estimators': 170, 'max_depth': 3, 'min_child_samples': 2, 'learning_rate': 0.04, 'subsample': 0.73, 'colsample_bytree': 0.8400000000000001}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:01,745] Trial 29 finished with value: 6.53 and parameters: {'n_estimators': 42, 'max_depth': 22, 'min_child_samples': 3, 'learning_rate': 0.060000000000000005, 'subsample': 0.89, 'colsample_bytree': 0.66}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:01,923] Trial 30 finished with value: 6.35 and parameters: {'n_estimators': 95, 'max_depth': 4, 'min_child_samples': 1, 'learning_rate': 0.05, 'subsample': 0.54, 'colsample_bytree': 0.74}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:02,965] Trial 31 finished with value: 6.54 and parameters: {'n_estimators': 484, 'max_depth': 7, 'min_child_samples': 4, 'learning_rate': 0.09, 'subsample': 0.61, 'colsample_bytree': 0.8}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:03,478] Trial 32 finished with value: 6.4 and parameters: {'n_estimators': 192, 'max_depth': 9, 'min_child_samples': 5, 'learning_rate': 0.02, 'subsample': 0.66, 'colsample_bytree': 0.76}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:04,059] Trial 33 finished with value: 6.52 and parameters: {'n_estimators': 218, 'max_depth': 10, 'min_child_samples': 6, 'learning_rate': 0.060000000000000005, 'subsample': 0.6, 'colsample_bytree': 0.8500000000000001}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:04,438] Trial 34 finished with value: 6.41 and parameters: {'n_estimators': 127, 'max_depth': 7, 'min_child_samples': 3, 'learning_rate': 0.03, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.8200000000000001}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:05,200] Trial 35 finished with value: 6.39 and parameters: {'n_estimators': 264, 'max_depth': 16, 'min_child_samples': 2, 'learning_rate': 0.04, 'subsample': 0.53, 'colsample_bytree': 0.88}. Best is trial 19 with value: 6.31.
[I 2023-08-24 10:27:06,126] Trial 36 finished with value: 6.3 and parameters: {'n_estimators': 417, 'max_depth': 6, 'min_child_samples': 4, 'learning_rate': 0.02, 'subsample': 0.71, 'colsample_bytree': 0.8200000000000001}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:06,534] Trial 37 finished with value: 6.41 and parameters: {'n_estimators': 414, 'max_depth': 4, 'min_child_samples': 2, 'learning_rate': 0.05, 'subsample': 0.8, 'colsample_bytree': 0.73}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:06,765] Trial 38 finished with value: 6.78 and parameters: {'n_estimators': 60, 'max_depth': 11, 'min_child_samples': 3, 'learning_rate': 0.02, 'subsample': 0.71, 'colsample_bytree': 0.77}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:07,404] Trial 39 finished with value: 6.57 and parameters: {'n_estimators': 332, 'max_depth': 6, 'min_child_samples': 6, 'learning_rate': 0.06999999999999999, 'subsample': 0.77, 'colsample_bytree': 0.9299999999999999}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:07,568] Trial 40 finished with value: 6.4 and parameters: {'n_estimators': 151, 'max_depth': 3, 'min_child_samples': 1, 'learning_rate': 0.04, 'subsample': 0.8300000000000001, 'colsample_bytree': 0.8300000000000001}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:08,839] Trial 41 finished with value: 6.33 and parameters: {'n_estimators': 494, 'max_depth': 9, 'min_child_samples': 4, 'learning_rate': 0.01, 'subsample': 0.64, 'colsample_bytree': 0.81}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:09,053] Trial 42 finished with value: 6.67 and parameters: {'n_estimators': 446, 'max_depth': 2, 'min_child_samples': 4, 'learning_rate': 0.01, 'subsample': 0.69, 'colsample_bytree': 0.61}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:09,931] Trial 43 finished with value: 6.36 and parameters: {'n_estimators': 426, 'max_depth': 6, 'min_child_samples': 5, 'learning_rate': 0.02, 'subsample': 0.59, 'colsample_bytree': 0.89}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:11,159] Trial 44 finished with value: 6.37 and parameters: {'n_estimators': 469, 'max_depth': 8, 'min_child_samples': 3, 'learning_rate': 0.01, 'subsample': 0.73, 'colsample_bytree': 0.86}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:11,299] Trial 45 finished with value: 6.82 and parameters: {'n_estimators': 95, 'max_depth': 3, 'min_child_samples': 4, 'learning_rate': 0.02, 'subsample': 0.66, 'colsample_bytree': 0.9199999999999999}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:11,796] Trial 46 finished with value: 6.38 and parameters: {'n_estimators': 188, 'max_depth': 29, 'min_child_samples': 5, 'learning_rate': 0.03, 'subsample': 0.51, 'colsample_bytree': 0.73}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:12,443] Trial 47 finished with value: 6.39 and parameters: {'n_estimators': 414, 'max_depth': 6, 'min_child_samples': 6, 'learning_rate': 0.03, 'subsample': 0.56, 'colsample_bytree': 0.55}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:13,499] Trial 48 finished with value: 6.47 and parameters: {'n_estimators': 458, 'max_depth': 8, 'min_child_samples': 2, 'learning_rate': 0.060000000000000005, 'subsample': 0.63, 'colsample_bytree': 0.76}. Best is trial 36 with value: 6.3.
[I 2023-08-24 10:27:14,383] Trial 49 finished with value: 6.52 and parameters: {'n_estimators': 333, 'max_depth': 12, 'min_child_samples': 3, 'learning_rate': 0.05, 'subsample': 0.72, 'colsample_bytree': 0.95}. Best is trial 36 with value: 6.3.
Лучшие параметры: {'n_estimators': 417, 'max_depth': 6, 'min_child_samples': 4, 'learning_rate': 0.02, 'subsample': 0.71, 'colsample_bytree': 0.8200000000000001}
Лучшее значение метрики: 6.3
| n_estimators | max_depth | min_child_samples | learning_rate | subsample | colsample_bytree | score | |
|---|---|---|---|---|---|---|---|
| 0 | 417.0 | 6.0 | 4.0 | 0.02 | 0.71 | 0.82 | 6.3 |
<timed exec>:57: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:73: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 28.6 s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["max_depth", "n_estimators"])
fig.show()
Коректируем интервалы параметров для подбора.
%%time
def objective(trial):
# Определяем параметры для подбора
n_estimators = trial.suggest_int('n_estimators', 100, 400)
max_depth = trial.suggest_int('max_depth', 1, 10)
min_child_samples = trial.suggest_int('min_child_samples', 6, 10)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.1, step=0.01)
subsample = trial.suggest_float('subsample', 0.5, 0.9, step=0.01)
colsample_bytree = trial.suggest_float('colsample_bytree', 0.7, 1, step=0.1)
# Создаем модель с определенными параметрами
model = LGBMRegressor(n_estimators=n_estimators,
max_depth=max_depth,
min_child_samples=min_child_samples,
learning_rate=learning_rate,
subsample=subsample,
colsample_bytree=colsample_bytree,
random_state=RANDOM_STATE)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращаем значения метрики
return score.mean()
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаем датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'n_estimators': [],
'max_depth': [],
'min_child_samples': [],
'learning_rate': [],
'subsample': [],
'colsample_bytree': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаем график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаем график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:27:17,065] A new study created in memory with name: no-name-b9e8db1d-6e83-4904-b89d-56697bd66880
[I 2023-08-24 10:27:17,198] Trial 0 finished with value: 6.35 and parameters: {'n_estimators': 120, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.04, 'subsample': 0.8400000000000001, 'colsample_bytree': 0.7999999999999999}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:17,531] Trial 1 finished with value: 6.41 and parameters: {'n_estimators': 241, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.59, 'colsample_bytree': 0.7}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:18,420] Trial 2 finished with value: 6.56 and parameters: {'n_estimators': 372, 'max_depth': 9, 'min_child_samples': 8, 'learning_rate': 0.08, 'subsample': 0.6799999999999999, 'colsample_bytree': 1.0}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:18,669] Trial 3 finished with value: 6.57 and parameters: {'n_estimators': 101, 'max_depth': 10, 'min_child_samples': 7, 'learning_rate': 0.02, 'subsample': 0.88, 'colsample_bytree': 0.7}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:18,753] Trial 4 finished with value: 6.56 and parameters: {'n_estimators': 255, 'max_depth': 1, 'min_child_samples': 8, 'learning_rate': 0.06999999999999999, 'subsample': 0.88, 'colsample_bytree': 0.8999999999999999}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:19,194] Trial 5 finished with value: 6.56 and parameters: {'n_estimators': 235, 'max_depth': 9, 'min_child_samples': 10, 'learning_rate': 0.09, 'subsample': 0.7, 'colsample_bytree': 0.8999999999999999}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:19,502] Trial 6 finished with value: 6.48 and parameters: {'n_estimators': 112, 'max_depth': 10, 'min_child_samples': 10, 'learning_rate': 0.02, 'subsample': 0.87, 'colsample_bytree': 1.0}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:20,048] Trial 7 finished with value: 6.53 and parameters: {'n_estimators': 369, 'max_depth': 6, 'min_child_samples': 6, 'learning_rate': 0.06999999999999999, 'subsample': 0.62, 'colsample_bytree': 0.7999999999999999}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:20,337] Trial 8 finished with value: 6.43 and parameters: {'n_estimators': 154, 'max_depth': 8, 'min_child_samples': 6, 'learning_rate': 0.06999999999999999, 'subsample': 0.8500000000000001, 'colsample_bytree': 0.7}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:20,488] Trial 9 finished with value: 6.4 and parameters: {'n_estimators': 119, 'max_depth': 5, 'min_child_samples': 9, 'learning_rate': 0.09, 'subsample': 0.76, 'colsample_bytree': 0.8999999999999999}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:20,624] Trial 10 finished with value: 6.49 and parameters: {'n_estimators': 193, 'max_depth': 2, 'min_child_samples': 9, 'learning_rate': 0.04, 'subsample': 0.51, 'colsample_bytree': 0.7999999999999999}. Best is trial 0 with value: 6.35.
[I 2023-08-24 10:27:20,825] Trial 11 finished with value: 6.29 and parameters: {'n_estimators': 163, 'max_depth': 4, 'min_child_samples': 9, 'learning_rate': 0.04, 'subsample': 0.78, 'colsample_bytree': 0.8999999999999999}. Best is trial 11 with value: 6.29.
[I 2023-08-24 10:27:20,983] Trial 12 finished with value: 6.37 and parameters: {'n_estimators': 164, 'max_depth': 3, 'min_child_samples': 9, 'learning_rate': 0.04, 'subsample': 0.79, 'colsample_bytree': 0.7999999999999999}. Best is trial 11 with value: 6.29.
[I 2023-08-24 10:27:21,266] Trial 13 finished with value: 6.28 and parameters: {'n_estimators': 300, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.04, 'subsample': 0.78, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:21,844] Trial 14 finished with value: 6.49 and parameters: {'n_estimators': 301, 'max_depth': 7, 'min_child_samples': 9, 'learning_rate': 0.05, 'subsample': 0.76, 'colsample_bytree': 1.0}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:22,233] Trial 15 finished with value: 6.51 and parameters: {'n_estimators': 314, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.01, 'subsample': 0.79, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:22,484] Trial 16 finished with value: 6.29 and parameters: {'n_estimators': 306, 'max_depth': 3, 'min_child_samples': 10, 'learning_rate': 0.05, 'subsample': 0.73, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:22,910] Trial 17 finished with value: 6.37 and parameters: {'n_estimators': 203, 'max_depth': 6, 'min_child_samples': 9, 'learning_rate': 0.03, 'subsample': 0.65, 'colsample_bytree': 1.0}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:23,040] Trial 18 finished with value: 6.58 and parameters: {'n_estimators': 275, 'max_depth': 1, 'min_child_samples': 8, 'learning_rate': 0.060000000000000005, 'subsample': 0.81, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:23,274] Trial 19 finished with value: 6.3 and parameters: {'n_estimators': 336, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.03, 'subsample': 0.73, 'colsample_bytree': 0.7999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:23,538] Trial 20 finished with value: 6.68 and parameters: {'n_estimators': 207, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.01, 'subsample': 0.8200000000000001, 'colsample_bytree': 1.0}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:23,764] Trial 21 finished with value: 6.29 and parameters: {'n_estimators': 290, 'max_depth': 3, 'min_child_samples': 10, 'learning_rate': 0.05, 'subsample': 0.73, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:23,946] Trial 22 finished with value: 6.31 and parameters: {'n_estimators': 332, 'max_depth': 2, 'min_child_samples': 10, 'learning_rate': 0.060000000000000005, 'subsample': 0.75, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:24,362] Trial 23 finished with value: 6.4 and parameters: {'n_estimators': 342, 'max_depth': 5, 'min_child_samples': 10, 'learning_rate': 0.05, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:24,518] Trial 24 finished with value: 6.5 and parameters: {'n_estimators': 249, 'max_depth': 2, 'min_child_samples': 9, 'learning_rate': 0.03, 'subsample': 0.72, 'colsample_bytree': 0.8999999999999999}. Best is trial 13 with value: 6.28.
[I 2023-08-24 10:27:24,791] Trial 25 finished with value: 6.27 and parameters: {'n_estimators': 278, 'max_depth': 4, 'min_child_samples': 9, 'learning_rate': 0.04, 'subsample': 0.77, 'colsample_bytree': 0.7999999999999999}. Best is trial 25 with value: 6.27.
[I 2023-08-24 10:27:25,055] Trial 26 finished with value: 6.26 and parameters: {'n_estimators': 269, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.04, 'subsample': 0.77, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:25,794] Trial 27 finished with value: 6.33 and parameters: {'n_estimators': 396, 'max_depth': 7, 'min_child_samples': 8, 'learning_rate': 0.02, 'subsample': 0.8300000000000001, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:26,142] Trial 28 finished with value: 6.3 and parameters: {'n_estimators': 267, 'max_depth': 5, 'min_child_samples': 8, 'learning_rate': 0.03, 'subsample': 0.8, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:26,538] Trial 29 finished with value: 6.3 and parameters: {'n_estimators': 226, 'max_depth': 6, 'min_child_samples': 7, 'learning_rate': 0.04, 'subsample': 0.9, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:26,806] Trial 30 finished with value: 6.34 and parameters: {'n_estimators': 281, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.060000000000000005, 'subsample': 0.8500000000000001, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:27,010] Trial 31 finished with value: 6.31 and parameters: {'n_estimators': 156, 'max_depth': 4, 'min_child_samples': 9, 'learning_rate': 0.04, 'subsample': 0.77, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:27,342] Trial 32 finished with value: 6.32 and parameters: {'n_estimators': 263, 'max_depth': 5, 'min_child_samples': 9, 'learning_rate': 0.03, 'subsample': 0.78, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:27,543] Trial 33 finished with value: 6.28 and parameters: {'n_estimators': 178, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.04, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:27,723] Trial 34 finished with value: 6.31 and parameters: {'n_estimators': 224, 'max_depth': 3, 'min_child_samples': 8, 'learning_rate': 0.04, 'subsample': 0.64, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:28,013] Trial 35 finished with value: 6.28 and parameters: {'n_estimators': 323, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.05, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:28,203] Trial 36 finished with value: 6.52 and parameters: {'n_estimators': 356, 'max_depth': 2, 'min_child_samples': 8, 'learning_rate': 0.02, 'subsample': 0.59, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:28,572] Trial 37 finished with value: 6.29 and parameters: {'n_estimators': 294, 'max_depth': 5, 'min_child_samples': 8, 'learning_rate': 0.03, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:28,770] Trial 38 finished with value: 6.32 and parameters: {'n_estimators': 246, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.060000000000000005, 'subsample': 0.66, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:29,079] Trial 39 finished with value: 6.5 and parameters: {'n_estimators': 182, 'max_depth': 6, 'min_child_samples': 8, 'learning_rate': 0.09999999999999999, 'subsample': 0.75, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:29,248] Trial 40 finished with value: 6.56 and parameters: {'n_estimators': 131, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.02, 'subsample': 0.54, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:29,531] Trial 41 finished with value: 6.27 and parameters: {'n_estimators': 312, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.05, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:29,798] Trial 42 finished with value: 6.29 and parameters: {'n_estimators': 275, 'max_depth': 4, 'min_child_samples': 6, 'learning_rate': 0.04, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:30,176] Trial 43 finished with value: 6.32 and parameters: {'n_estimators': 312, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.05, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:30,397] Trial 44 finished with value: 6.31 and parameters: {'n_estimators': 289, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.04, 'subsample': 0.61, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:30,707] Trial 45 finished with value: 6.34 and parameters: {'n_estimators': 233, 'max_depth': 5, 'min_child_samples': 8, 'learning_rate': 0.05, 'subsample': 0.74, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:31,166] Trial 46 finished with value: 6.53 and parameters: {'n_estimators': 254, 'max_depth': 7, 'min_child_samples': 6, 'learning_rate': 0.06999999999999999, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:31,351] Trial 47 finished with value: 6.34 and parameters: {'n_estimators': 140, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.04, 'subsample': 0.7, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:31,579] Trial 48 finished with value: 6.26 and parameters: {'n_estimators': 321, 'max_depth': 3, 'min_child_samples': 9, 'learning_rate': 0.060000000000000005, 'subsample': 0.8400000000000001, 'colsample_bytree': 0.7}. Best is trial 26 with value: 6.26.
[I 2023-08-24 10:27:31,768] Trial 49 finished with value: 6.28 and parameters: {'n_estimators': 349, 'max_depth': 2, 'min_child_samples': 9, 'learning_rate': 0.08, 'subsample': 0.87, 'colsample_bytree': 0.7999999999999999}. Best is trial 26 with value: 6.26.
Лучшие параметры: {'n_estimators': 269, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.04, 'subsample': 0.77, 'colsample_bytree': 0.7999999999999999}
Лучшее значение метрики: 6.26
| n_estimators | max_depth | min_child_samples | learning_rate | subsample | colsample_bytree | score | |
|---|---|---|---|---|---|---|---|
| 0 | 269.0 | 4.0 | 8.0 | 0.04 | 0.77 | 0.8 | 6.26 |
| 1 | 321.0 | 3.0 | 9.0 | 0.06 | 0.84 | 0.7 | 6.26 |
<timed exec>:57: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:73: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 17.2 s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["max_depth", "n_estimators"])
fig.show()
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["learning_rate", "n_estimators"])
fig.show()
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["subsample", "n_estimators"])
fig.show()
Делаем корректировку параметров и окончательный подбор.
%%time
def objective(trial):
# Определяем параметры для подбора
n_estimators = trial.suggest_int('n_estimators', 150, 350)
max_depth = trial.suggest_int('max_depth', 2, 6)
min_child_samples = trial.suggest_int('min_child_samples', 6, 8)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.1, step=0.01)
subsample = trial.suggest_float('subsample', 0.5, 0.9, step=0.01)
colsample_bytree = trial.suggest_float('colsample_bytree', 0.7, 1, step=0.1)
# Создаем модель с определенными параметрами
model = LGBMRegressor(n_estimators=n_estimators,
max_depth=max_depth,
min_child_samples=min_child_samples,
learning_rate=learning_rate,
subsample=subsample,
colsample_bytree=colsample_bytree,
random_state=RANDOM_STATE)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращаем значения метрики
return score.mean()
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=200)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаем датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'n_estimators': [],
'max_depth': [],
'min_child_samples': [],
'learning_rate': [],
'subsample': [],
'colsample_bytree': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаем график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаем график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
# Сохранение модели с наилучшими параметрами
lgbm = LGBMRegressor(n_estimators=study.best_params['n_estimators'],
max_depth=study.best_params['max_depth'],
min_child_samples=study.best_params['min_child_samples'],
learning_rate=study.best_params['learning_rate'],
subsample=study.best_params['subsample'],
colsample_bytree=study.best_params['colsample_bytree'],
random_state=RANDOM_STATE)
[I 2023-08-24 10:27:34,602] A new study created in memory with name: no-name-3796d158-4163-482b-ad38-0999a9d62315
[I 2023-08-24 10:27:34,746] Trial 0 finished with value: 6.3 and parameters: {'n_estimators': 290, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.09999999999999999, 'subsample': 0.8, 'colsample_bytree': 1.0}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:27:34,934] Trial 1 finished with value: 6.37 and parameters: {'n_estimators': 265, 'max_depth': 3, 'min_child_samples': 6, 'learning_rate': 0.08, 'subsample': 0.78, 'colsample_bytree': 1.0}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:27:35,130] Trial 2 finished with value: 6.39 and parameters: {'n_estimators': 197, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.09, 'subsample': 0.81, 'colsample_bytree': 1.0}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:27:35,224] Trial 3 finished with value: 6.31 and parameters: {'n_estimators': 190, 'max_depth': 2, 'min_child_samples': 8, 'learning_rate': 0.09999999999999999, 'subsample': 0.55, 'colsample_bytree': 0.7}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:27:35,418] Trial 4 finished with value: 6.29 and parameters: {'n_estimators': 151, 'max_depth': 4, 'min_child_samples': 8, 'learning_rate': 0.08, 'subsample': 0.72, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:35,519] Trial 5 finished with value: 6.36 and parameters: {'n_estimators': 195, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.060000000000000005, 'subsample': 0.66, 'colsample_bytree': 0.7}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:35,663] Trial 6 finished with value: 6.3 and parameters: {'n_estimators': 337, 'max_depth': 2, 'min_child_samples': 8, 'learning_rate': 0.060000000000000005, 'subsample': 0.72, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:35,799] Trial 7 finished with value: 6.29 and parameters: {'n_estimators': 295, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.78, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:36,345] Trial 8 finished with value: 6.34 and parameters: {'n_estimators': 340, 'max_depth': 6, 'min_child_samples': 7, 'learning_rate': 0.02, 'subsample': 0.8200000000000001, 'colsample_bytree': 0.7}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:36,628] Trial 9 finished with value: 6.46 and parameters: {'n_estimators': 194, 'max_depth': 6, 'min_child_samples': 8, 'learning_rate': 0.09999999999999999, 'subsample': 0.72, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:36,877] Trial 10 finished with value: 6.36 and parameters: {'n_estimators': 152, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.03, 'subsample': 0.9, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:37,179] Trial 11 finished with value: 6.42 and parameters: {'n_estimators': 298, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.64, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:37,372] Trial 12 finished with value: 6.3 and parameters: {'n_estimators': 238, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.58, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:37,684] Trial 13 finished with value: 6.41 and parameters: {'n_estimators': 240, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.9, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:37,841] Trial 14 finished with value: 6.35 and parameters: {'n_estimators': 152, 'max_depth': 3, 'min_child_samples': 8, 'learning_rate': 0.05, 'subsample': 0.73, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:38,229] Trial 15 finished with value: 6.31 and parameters: {'n_estimators': 311, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.04, 'subsample': 0.5, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:38,438] Trial 16 finished with value: 6.39 and parameters: {'n_estimators': 262, 'max_depth': 3, 'min_child_samples': 8, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:38,673] Trial 17 finished with value: 6.35 and parameters: {'n_estimators': 217, 'max_depth': 4, 'min_child_samples': 6, 'learning_rate': 0.06999999999999999, 'subsample': 0.76, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:38,987] Trial 18 finished with value: 6.54 and parameters: {'n_estimators': 283, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.01, 'subsample': 0.86, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:39,201] Trial 19 finished with value: 6.37 and parameters: {'n_estimators': 313, 'max_depth': 3, 'min_child_samples': 8, 'learning_rate': 0.09, 'subsample': 0.62, 'colsample_bytree': 0.7}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:39,455] Trial 20 finished with value: 6.36 and parameters: {'n_estimators': 170, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.69, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:39,625] Trial 21 finished with value: 6.31 and parameters: {'n_estimators': 279, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.09999999999999999, 'subsample': 0.77, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:39,804] Trial 22 finished with value: 6.31 and parameters: {'n_estimators': 312, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.09999999999999999, 'subsample': 0.8300000000000001, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:39,976] Trial 23 finished with value: 6.32 and parameters: {'n_estimators': 291, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.09, 'subsample': 0.75, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:40,189] Trial 24 finished with value: 6.37 and parameters: {'n_estimators': 261, 'max_depth': 3, 'min_child_samples': 6, 'learning_rate': 0.08, 'subsample': 0.86, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:40,335] Trial 25 finished with value: 6.29 and parameters: {'n_estimators': 223, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.8, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:40,576] Trial 26 finished with value: 6.41 and parameters: {'n_estimators': 221, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.86, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:40,740] Trial 27 finished with value: 6.32 and parameters: {'n_estimators': 170, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.79, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:40,880] Trial 28 finished with value: 6.37 and parameters: {'n_estimators': 224, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.05, 'subsample': 0.69, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:41,017] Trial 29 finished with value: 6.33 and parameters: {'n_estimators': 176, 'max_depth': 2, 'min_child_samples': 8, 'learning_rate': 0.09999999999999999, 'subsample': 0.73, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:41,433] Trial 30 finished with value: 6.47 and parameters: {'n_estimators': 242, 'max_depth': 6, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.8400000000000001, 'colsample_bytree': 0.8999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:41,600] Trial 31 finished with value: 6.3 and parameters: {'n_estimators': 272, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.09999999999999999, 'subsample': 0.79, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:41,851] Trial 32 finished with value: 6.48 and parameters: {'n_estimators': 325, 'max_depth': 3, 'min_child_samples': 6, 'learning_rate': 0.09999999999999999, 'subsample': 0.8, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:42,039] Trial 33 finished with value: 6.32 and parameters: {'n_estimators': 296, 'max_depth': 2, 'min_child_samples': 8, 'learning_rate': 0.09, 'subsample': 0.75, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:42,248] Trial 34 finished with value: 6.35 and parameters: {'n_estimators': 255, 'max_depth': 3, 'min_child_samples': 8, 'learning_rate': 0.08, 'subsample': 0.78, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:42,428] Trial 35 finished with value: 6.3 and parameters: {'n_estimators': 327, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.81, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:42,572] Trial 36 finished with value: 6.3 and parameters: {'n_estimators': 204, 'max_depth': 2, 'min_child_samples': 6, 'learning_rate': 0.09999999999999999, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:42,705] Trial 37 finished with value: 6.36 and parameters: {'n_estimators': 182, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.75, 'colsample_bytree': 0.7999999999999999}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:42,895] Trial 38 finished with value: 6.31 and parameters: {'n_estimators': 207, 'max_depth': 3, 'min_child_samples': 8, 'learning_rate': 0.060000000000000005, 'subsample': 0.8300000000000001, 'colsample_bytree': 1.0}. Best is trial 4 with value: 6.29.
[I 2023-08-24 10:27:43,071] Trial 39 finished with value: 6.27 and parameters: {'n_estimators': 304, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.77, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:43,264] Trial 40 finished with value: 6.29 and parameters: {'n_estimators': 350, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.67, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:43,449] Trial 41 finished with value: 6.29 and parameters: {'n_estimators': 344, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.67, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:43,640] Trial 42 finished with value: 6.28 and parameters: {'n_estimators': 330, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.62, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:43,830] Trial 43 finished with value: 6.31 and parameters: {'n_estimators': 325, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.58, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:44,058] Trial 44 finished with value: 6.33 and parameters: {'n_estimators': 307, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.64, 'colsample_bytree': 0.7999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:44,233] Trial 45 finished with value: 6.28 and parameters: {'n_estimators': 305, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:44,548] Trial 46 finished with value: 6.48 and parameters: {'n_estimators': 331, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6, 'colsample_bytree': 0.7999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:44,785] Trial 47 finished with value: 6.34 and parameters: {'n_estimators': 302, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.54, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:45,304] Trial 48 finished with value: 6.62 and parameters: {'n_estimators': 317, 'max_depth': 6, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.54, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:45,660] Trial 49 finished with value: 6.51 and parameters: {'n_estimators': 282, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.64, 'colsample_bytree': 0.7999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:45,848] Trial 50 finished with value: 6.42 and parameters: {'n_estimators': 320, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.03, 'subsample': 0.58, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:46,044] Trial 51 finished with value: 6.28 and parameters: {'n_estimators': 335, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.61, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:46,234] Trial 52 finished with value: 6.28 and parameters: {'n_estimators': 336, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:46,425] Trial 53 finished with value: 6.29 and parameters: {'n_estimators': 339, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.61, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:46,617] Trial 54 finished with value: 6.31 and parameters: {'n_estimators': 331, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.56, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:46,810] Trial 55 finished with value: 6.29 and parameters: {'n_estimators': 350, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.56, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:46,993] Trial 56 finished with value: 6.28 and parameters: {'n_estimators': 305, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:47,175] Trial 57 finished with value: 6.27 and parameters: {'n_estimators': 302, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.51, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:47,367] Trial 58 finished with value: 6.3 and parameters: {'n_estimators': 336, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.5, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:47,539] Trial 59 finished with value: 6.28 and parameters: {'n_estimators': 317, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.52, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:47,779] Trial 60 finished with value: 6.39 and parameters: {'n_estimators': 287, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.62, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:47,956] Trial 61 finished with value: 6.27 and parameters: {'n_estimators': 302, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:48,144] Trial 62 finished with value: 6.27 and parameters: {'n_estimators': 303, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:48,311] Trial 63 finished with value: 6.27 and parameters: {'n_estimators': 271, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:48,488] Trial 64 finished with value: 6.3 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.63, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:48,666] Trial 65 finished with value: 6.27 and parameters: {'n_estimators': 292, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:48,856] Trial 66 finished with value: 6.27 and parameters: {'n_estimators': 290, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:49,037] Trial 67 finished with value: 6.3 and parameters: {'n_estimators': 291, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:49,201] Trial 68 finished with value: 6.27 and parameters: {'n_estimators': 268, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:49,428] Trial 69 finished with value: 6.4 and parameters: {'n_estimators': 298, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.69, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:49,645] Trial 70 finished with value: 6.28 and parameters: {'n_estimators': 287, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.04, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:49,825] Trial 71 finished with value: 6.27 and parameters: {'n_estimators': 269, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:49,997] Trial 72 finished with value: 6.27 and parameters: {'n_estimators': 266, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:50,161] Trial 73 finished with value: 6.31 and parameters: {'n_estimators': 251, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.71, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:50,338] Trial 74 finished with value: 6.86 and parameters: {'n_estimators': 279, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.01, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:50,515] Trial 75 finished with value: 6.27 and parameters: {'n_estimators': 296, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.7, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:50,685] Trial 76 finished with value: 6.3 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:50,869] Trial 77 finished with value: 6.28 and parameters: {'n_estimators': 310, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:51,039] Trial 78 finished with value: 6.27 and parameters: {'n_estimators': 261, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.63, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:51,220] Trial 79 finished with value: 6.3 and parameters: {'n_estimators': 291, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.73, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:51,447] Trial 80 finished with value: 6.4 and parameters: {'n_estimators': 301, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.58, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:51,609] Trial 81 finished with value: 6.27 and parameters: {'n_estimators': 269, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.64, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:51,788] Trial 82 finished with value: 6.27 and parameters: {'n_estimators': 284, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:51,966] Trial 83 finished with value: 6.27 and parameters: {'n_estimators': 259, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:52,131] Trial 84 finished with value: 6.3 and parameters: {'n_estimators': 267, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:52,296] Trial 85 finished with value: 6.27 and parameters: {'n_estimators': 249, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.63, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:52,473] Trial 86 finished with value: 6.27 and parameters: {'n_estimators': 272, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.67, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:52,651] Trial 87 finished with value: 6.3 and parameters: {'n_estimators': 293, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.7, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:52,816] Trial 88 finished with value: 6.28 and parameters: {'n_estimators': 233, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.52, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:53,003] Trial 89 finished with value: 6.31 and parameters: {'n_estimators': 301, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:53,229] Trial 90 finished with value: 6.33 and parameters: {'n_estimators': 284, 'max_depth': 3, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.61, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:53,405] Trial 91 finished with value: 6.27 and parameters: {'n_estimators': 266, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:53,571] Trial 92 finished with value: 6.28 and parameters: {'n_estimators': 254, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:53,741] Trial 93 finished with value: 6.27 and parameters: {'n_estimators': 278, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.63, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:53,924] Trial 94 finished with value: 6.27 and parameters: {'n_estimators': 271, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:54,121] Trial 95 finished with value: 6.28 and parameters: {'n_estimators': 309, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.69, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:54,305] Trial 96 finished with value: 6.28 and parameters: {'n_estimators': 315, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.64, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:54,497] Trial 97 finished with value: 6.3 and parameters: {'n_estimators': 322, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.59, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:54,671] Trial 98 finished with value: 6.29 and parameters: {'n_estimators': 288, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.67, 'colsample_bytree': 0.7999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:54,859] Trial 99 finished with value: 6.3 and parameters: {'n_estimators': 297, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.060000000000000005, 'subsample': 0.71, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:55,035] Trial 100 finished with value: 6.3 and parameters: {'n_estimators': 263, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.62, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:55,216] Trial 101 finished with value: 6.27 and parameters: {'n_estimators': 294, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.89, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:55,382] Trial 102 finished with value: 6.28 and parameters: {'n_estimators': 245, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.74, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:55,570] Trial 103 finished with value: 6.28 and parameters: {'n_estimators': 306, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.66, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:55,748] Trial 104 finished with value: 6.27 and parameters: {'n_estimators': 282, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.7, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:56,199] Trial 105 finished with value: 6.31 and parameters: {'n_estimators': 299, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.02, 'subsample': 0.67, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:56,388] Trial 106 finished with value: 6.32 and parameters: {'n_estimators': 312, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.05, 'subsample': 0.77, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:56,568] Trial 107 finished with value: 6.27 and parameters: {'n_estimators': 303, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.65, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:56,732] Trial 108 finished with value: 6.31 and parameters: {'n_estimators': 256, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.72, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:56,921] Trial 109 finished with value: 6.27 and parameters: {'n_estimators': 280, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.64, 'colsample_bytree': 0.8999999999999999}. Best is trial 39 with value: 6.27.
[I 2023-08-24 10:27:57,089] Trial 110 finished with value: 6.26 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:57,255] Trial 111 finished with value: 6.26 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:57,416] Trial 112 finished with value: 6.26 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:57,677] Trial 113 finished with value: 6.42 and parameters: {'n_estimators': 274, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:57,844] Trial 114 finished with value: 6.26 and parameters: {'n_estimators': 287, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09999999999999999, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:58,030] Trial 115 finished with value: 6.26 and parameters: {'n_estimators': 287, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:58,208] Trial 116 finished with value: 6.26 and parameters: {'n_estimators': 287, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:58,381] Trial 117 finished with value: 6.26 and parameters: {'n_estimators': 287, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:58,562] Trial 118 finished with value: 6.26 and parameters: {'n_estimators': 284, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.73, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:58,740] Trial 119 finished with value: 6.3 and parameters: {'n_estimators': 286, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 110 with value: 6.26.
[I 2023-08-24 10:27:58,914] Trial 120 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.76, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:27:59,093] Trial 121 finished with value: 6.26 and parameters: {'n_estimators': 278, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.74, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:27:59,259] Trial 122 finished with value: 6.27 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.74, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:27:59,433] Trial 123 finished with value: 6.3 and parameters: {'n_estimators': 284, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.76, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:27:59,600] Trial 124 finished with value: 6.27 and parameters: {'n_estimators': 277, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.73, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:27:59,777] Trial 125 finished with value: 6.27 and parameters: {'n_estimators': 280, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.76, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:27:59,952] Trial 126 finished with value: 6.26 and parameters: {'n_estimators': 289, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:00,141] Trial 127 finished with value: 6.26 and parameters: {'n_estimators': 289, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:00,310] Trial 128 finished with value: 6.26 and parameters: {'n_estimators': 287, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:00,484] Trial 129 finished with value: 6.26 and parameters: {'n_estimators': 288, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:00,653] Trial 130 finished with value: 6.27 and parameters: {'n_estimators': 282, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:00,827] Trial 131 finished with value: 6.26 and parameters: {'n_estimators': 288, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:01,006] Trial 132 finished with value: 6.26 and parameters: {'n_estimators': 294, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.74, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:01,178] Trial 133 finished with value: 6.3 and parameters: {'n_estimators': 288, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:01,346] Trial 134 finished with value: 6.26 and parameters: {'n_estimators': 274, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.73, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:01,522] Trial 135 finished with value: 6.26 and parameters: {'n_estimators': 290, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:01,695] Trial 136 finished with value: 6.26 and parameters: {'n_estimators': 284, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.75, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:01,870] Trial 137 finished with value: 6.27 and parameters: {'n_estimators': 279, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:02,050] Trial 138 finished with value: 6.26 and parameters: {'n_estimators': 295, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.73, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:02,233] Trial 139 finished with value: 6.26 and parameters: {'n_estimators': 273, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:02,404] Trial 140 finished with value: 6.26 and parameters: {'n_estimators': 285, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.74, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:02,583] Trial 141 finished with value: 6.26 and parameters: {'n_estimators': 292, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:02,758] Trial 142 finished with value: 6.26 and parameters: {'n_estimators': 289, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:02,933] Trial 143 finished with value: 6.27 and parameters: {'n_estimators': 280, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:03,120] Trial 144 finished with value: 6.26 and parameters: {'n_estimators': 289, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:03,301] Trial 145 finished with value: 6.3 and parameters: {'n_estimators': 286, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.06999999999999999, 'subsample': 0.75, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:03,474] Trial 146 finished with value: 6.26 and parameters: {'n_estimators': 298, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:03,642] Trial 147 finished with value: 6.27 and parameters: {'n_estimators': 277, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:03,819] Trial 148 finished with value: 6.25 and parameters: {'n_estimators': 283, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:04,086] Trial 149 finished with value: 6.39 and parameters: {'n_estimators': 264, 'max_depth': 4, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:04,266] Trial 150 finished with value: 6.26 and parameters: {'n_estimators': 282, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.73, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:04,449] Trial 151 finished with value: 6.26 and parameters: {'n_estimators': 293, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.72, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:04,618] Trial 152 finished with value: 6.26 and parameters: {'n_estimators': 270, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.08, 'subsample': 0.7, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:04,797] Trial 153 finished with value: 6.26 and parameters: {'n_estimators': 286, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.74, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:04,973] Trial 154 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:05,146] Trial 155 finished with value: 6.25 and parameters: {'n_estimators': 279, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.71, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:05,316] Trial 156 finished with value: 6.25 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:05,477] Trial 157 finished with value: 6.25 and parameters: {'n_estimators': 274, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:05,651] Trial 158 finished with value: 6.25 and parameters: {'n_estimators': 274, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:05,825] Trial 159 finished with value: 6.26 and parameters: {'n_estimators': 259, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:06,002] Trial 160 finished with value: 6.26 and parameters: {'n_estimators': 272, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:06,258] Trial 161 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:06,442] Trial 162 finished with value: 6.25 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:06,614] Trial 163 finished with value: 6.26 and parameters: {'n_estimators': 268, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:06,789] Trial 164 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:06,970] Trial 165 finished with value: 6.25 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:07,148] Trial 166 finished with value: 6.25 and parameters: {'n_estimators': 264, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:07,317] Trial 167 finished with value: 6.25 and parameters: {'n_estimators': 264, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:07,737] Trial 168 finished with value: 6.51 and parameters: {'n_estimators': 263, 'max_depth': 6, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:07,910] Trial 169 finished with value: 6.26 and parameters: {'n_estimators': 266, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:08,087] Trial 170 finished with value: 6.26 and parameters: {'n_estimators': 273, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:08,273] Trial 171 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:08,460] Trial 172 finished with value: 6.26 and parameters: {'n_estimators': 270, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:08,633] Trial 173 finished with value: 6.25 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:08,807] Trial 174 finished with value: 6.25 and parameters: {'n_estimators': 275, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:08,983] Trial 175 finished with value: 6.26 and parameters: {'n_estimators': 260, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:09,160] Trial 176 finished with value: 6.26 and parameters: {'n_estimators': 267, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:09,340] Trial 177 finished with value: 6.25 and parameters: {'n_estimators': 274, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:09,504] Trial 178 finished with value: 6.26 and parameters: {'n_estimators': 270, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:09,674] Trial 179 finished with value: 6.25 and parameters: {'n_estimators': 277, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:09,853] Trial 180 finished with value: 6.25 and parameters: {'n_estimators': 279, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:10,027] Trial 181 finished with value: 6.25 and parameters: {'n_estimators': 278, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:10,198] Trial 182 finished with value: 6.25 and parameters: {'n_estimators': 264, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:10,371] Trial 183 finished with value: 6.26 and parameters: {'n_estimators': 273, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:10,557] Trial 184 finished with value: 6.25 and parameters: {'n_estimators': 280, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.66, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:10,733] Trial 185 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:10,912] Trial 186 finished with value: 6.26 and parameters: {'n_estimators': 268, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:11,088] Trial 187 finished with value: 6.26 and parameters: {'n_estimators': 257, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:11,273] Trial 188 finished with value: 6.25 and parameters: {'n_estimators': 281, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:11,610] Trial 189 finished with value: 6.51 and parameters: {'n_estimators': 272, 'max_depth': 5, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.66, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:11,787] Trial 190 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:11,971] Trial 191 finished with value: 6.25 and parameters: {'n_estimators': 278, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:12,151] Trial 192 finished with value: 6.25 and parameters: {'n_estimators': 279, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:12,327] Trial 193 finished with value: 6.26 and parameters: {'n_estimators': 271, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.66, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:12,504] Trial 194 finished with value: 6.26 and parameters: {'n_estimators': 267, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.6799999999999999, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:12,697] Trial 195 finished with value: 6.25 and parameters: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:12,879] Trial 196 finished with value: 6.25 and parameters: {'n_estimators': 281, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:13,051] Trial 197 finished with value: 6.25 and parameters: {'n_estimators': 264, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:13,232] Trial 198 finished with value: 6.26 and parameters: {'n_estimators': 273, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.69, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
[I 2023-08-24 10:28:13,406] Trial 199 finished with value: 6.25 and parameters: {'n_estimators': 278, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.67, 'colsample_bytree': 0.7}. Best is trial 120 with value: 6.25.
Лучшие параметры: {'n_estimators': 276, 'max_depth': 2, 'min_child_samples': 7, 'learning_rate': 0.09, 'subsample': 0.76, 'colsample_bytree': 0.7}
Лучшее значение метрики: 6.25
| n_estimators | max_depth | min_child_samples | learning_rate | subsample | colsample_bytree | score | |
|---|---|---|---|---|---|---|---|
| 0 | 276.0 | 2.0 | 7.0 | 0.09 | 0.76 | 0.7 | 6.25 |
| 16 | 274.0 | 2.0 | 7.0 | 0.09 | 0.67 | 0.7 | 6.25 |
| 29 | 264.0 | 2.0 | 7.0 | 0.09 | 0.67 | 0.7 | 6.25 |
| 28 | 281.0 | 2.0 | 7.0 | 0.09 | 0.69 | 0.7 | 6.25 |
| 27 | 276.0 | 2.0 | 7.0 | 0.09 | 0.67 | 0.7 | 6.25 |
| 26 | 279.0 | 2.0 | 7.0 | 0.09 | 0.68 | 0.7 | 6.25 |
| 25 | 278.0 | 2.0 | 7.0 | 0.09 | 0.67 | 0.7 | 6.25 |
| 24 | 276.0 | 2.0 | 7.0 | 0.09 | 0.67 | 0.7 | 6.25 |
| 23 | 281.0 | 2.0 | 7.0 | 0.09 | 0.69 | 0.7 | 6.25 |
| 22 | 276.0 | 2.0 | 7.0 | 0.09 | 0.68 | 0.7 | 6.25 |
<timed exec>:57: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:73: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 43.8 s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["max_depth", "n_estimators"])
fig.show()
lgbm
LGBMRegressor(colsample_bytree=0.7, learning_rate=0.09, max_depth=2,
min_child_samples=7, n_estimators=276, random_state=140823,
subsample=0.76)
%%time
def objective(trial):
# Определяем параметры для подбора
iterations = trial.suggest_int('iterations', 100, 500)
depth = trial.suggest_int('depth', 1, 10)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.1, step=0.01)
# Создаем модель с определенными параметрами
model = CatBoostRegressor(iterations=iterations,
depth=depth,
learning_rate=learning_rate,
random_state=RANDOM_STATE,
silent=True)
# Оцениваем качество модели с помощью кросс-валидации
scores = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращаем среднее значение метрики
return abs(scores.mean())
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаем датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'iterations': [],
'depth': [],
'learning_rate': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаем график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры', fontsize=12, color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров', fontsize=15, color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
# Создаем график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации', fontsize=12, color='DarkSlateGray')
plt.ylabel('Целевая метрика', fontsize=12, color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n', fontsize=15, color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:28:18,439] A new study created in memory with name: no-name-040408ae-6f0a-42fc-88b3-1dc1a49dc00d
[I 2023-08-24 10:28:21,548] Trial 0 finished with value: 6.37 and parameters: {'iterations': 480, 'depth': 4, 'learning_rate': 0.09}. Best is trial 0 with value: 6.37.
[I 2023-08-24 10:28:25,071] Trial 1 finished with value: 6.31 and parameters: {'iterations': 205, 'depth': 7, 'learning_rate': 0.06999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:28:25,277] Trial 2 finished with value: 6.67 and parameters: {'iterations': 114, 'depth': 2, 'learning_rate': 0.060000000000000005}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:28:25,708] Trial 3 finished with value: 6.55 and parameters: {'iterations': 423, 'depth': 1, 'learning_rate': 0.060000000000000005}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:28:27,567] Trial 4 finished with value: 6.33 and parameters: {'iterations': 206, 'depth': 6, 'learning_rate': 0.06999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:30,281] Trial 5 finished with value: 6.42 and parameters: {'iterations': 464, 'depth': 10, 'learning_rate': 0.03}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:31,125] Trial 6 finished with value: 6.45 and parameters: {'iterations': 349, 'depth': 3, 'learning_rate': 0.03}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:34,805] Trial 7 finished with value: 6.46 and parameters: {'iterations': 416, 'depth': 6, 'learning_rate': 0.09}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:37,056] Trial 8 finished with value: 6.41 and parameters: {'iterations': 400, 'depth': 5, 'learning_rate': 0.09999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:37,737] Trial 9 finished with value: 6.93 and parameters: {'iterations': 181, 'depth': 4, 'learning_rate': 0.01}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:52,728] Trial 10 finished with value: 6.37 and parameters: {'iterations': 269, 'depth': 9, 'learning_rate': 0.04}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:29:56,267] Trial 11 finished with value: 6.32 and parameters: {'iterations': 233, 'depth': 7, 'learning_rate': 0.06999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:03,344] Trial 12 finished with value: 6.41 and parameters: {'iterations': 258, 'depth': 8, 'learning_rate': 0.08}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:06,201] Trial 13 finished with value: 6.33 and parameters: {'iterations': 186, 'depth': 7, 'learning_rate': 0.05}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:09,510] Trial 14 finished with value: 6.38 and parameters: {'iterations': 111, 'depth': 8, 'learning_rate': 0.06999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:14,343] Trial 15 finished with value: 6.33 and parameters: {'iterations': 316, 'depth': 7, 'learning_rate': 0.08}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:45,289] Trial 16 finished with value: 6.42 and parameters: {'iterations': 231, 'depth': 10, 'learning_rate': 0.05}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:49,579] Trial 17 finished with value: 6.36 and parameters: {'iterations': 152, 'depth': 8, 'learning_rate': 0.06999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:51,302] Trial 18 finished with value: 6.36 and parameters: {'iterations': 307, 'depth': 5, 'learning_rate': 0.09999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:30:55,087] Trial 19 finished with value: 6.31 and parameters: {'iterations': 247, 'depth': 7, 'learning_rate': 0.08}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:13,711] Trial 20 finished with value: 6.43 and parameters: {'iterations': 340, 'depth': 9, 'learning_rate': 0.09}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:17,624] Trial 21 finished with value: 6.31 and parameters: {'iterations': 255, 'depth': 7, 'learning_rate': 0.08}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:20,138] Trial 22 finished with value: 6.38 and parameters: {'iterations': 272, 'depth': 6, 'learning_rate': 0.08}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:27,129] Trial 23 finished with value: 6.41 and parameters: {'iterations': 227, 'depth': 8, 'learning_rate': 0.08}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:29,710] Trial 24 finished with value: 6.32 and parameters: {'iterations': 154, 'depth': 7, 'learning_rate': 0.060000000000000005}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:45,244] Trial 25 finished with value: 6.42 and parameters: {'iterations': 281, 'depth': 9, 'learning_rate': 0.09}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:46,175] Trial 26 finished with value: 6.31 and parameters: {'iterations': 158, 'depth': 5, 'learning_rate': 0.09999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:49,293] Trial 27 finished with value: 6.36 and parameters: {'iterations': 352, 'depth': 6, 'learning_rate': 0.06999999999999999}. Best is trial 1 with value: 6.31.
[I 2023-08-24 10:31:53,067] Trial 28 finished with value: 6.3 and parameters: {'iterations': 239, 'depth': 7, 'learning_rate': 0.08}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:31:53,795] Trial 29 finished with value: 6.42 and parameters: {'iterations': 189, 'depth': 4, 'learning_rate': 0.05}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:05,257] Trial 30 finished with value: 6.4 and parameters: {'iterations': 211, 'depth': 9, 'learning_rate': 0.09}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:09,071] Trial 31 finished with value: 6.31 and parameters: {'iterations': 246, 'depth': 7, 'learning_rate': 0.08}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:17,155] Trial 32 finished with value: 6.44 and parameters: {'iterations': 284, 'depth': 8, 'learning_rate': 0.08}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:20,555] Trial 33 finished with value: 6.3 and parameters: {'iterations': 212, 'depth': 7, 'learning_rate': 0.060000000000000005}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:21,771] Trial 34 finished with value: 6.39 and parameters: {'iterations': 133, 'depth': 6, 'learning_rate': 0.060000000000000005}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:22,048] Trial 35 finished with value: 6.66 and parameters: {'iterations': 209, 'depth': 1, 'learning_rate': 0.060000000000000005}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:23,682] Trial 36 finished with value: 6.32 and parameters: {'iterations': 180, 'depth': 6, 'learning_rate': 0.06999999999999999}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:24,885] Trial 37 finished with value: 6.39 and parameters: {'iterations': 210, 'depth': 5, 'learning_rate': 0.04}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:25,682] Trial 38 finished with value: 6.37 and parameters: {'iterations': 325, 'depth': 3, 'learning_rate': 0.06999999999999999}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:31,469] Trial 39 finished with value: 6.33 and parameters: {'iterations': 371, 'depth': 7, 'learning_rate': 0.060000000000000005}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:44,494] Trial 40 finished with value: 6.34 and parameters: {'iterations': 465, 'depth': 8, 'learning_rate': 0.02}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:48,266] Trial 41 finished with value: 6.31 and parameters: {'iterations': 246, 'depth': 7, 'learning_rate': 0.08}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:50,928] Trial 42 finished with value: 6.4 and parameters: {'iterations': 296, 'depth': 6, 'learning_rate': 0.09}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:32:54,645] Trial 43 finished with value: 6.31 and parameters: {'iterations': 242, 'depth': 7, 'learning_rate': 0.060000000000000005}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:33:01,737] Trial 44 finished with value: 6.39 and parameters: {'iterations': 260, 'depth': 8, 'learning_rate': 0.06999999999999999}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:33:05,119] Trial 45 finished with value: 6.3 and parameters: {'iterations': 221, 'depth': 7, 'learning_rate': 0.08}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:33:06,951] Trial 46 finished with value: 6.32 and parameters: {'iterations': 198, 'depth': 6, 'learning_rate': 0.06999999999999999}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:33:11,526] Trial 47 finished with value: 6.41 and parameters: {'iterations': 163, 'depth': 8, 'learning_rate': 0.04}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:33:23,670] Trial 48 finished with value: 6.4 and parameters: {'iterations': 219, 'depth': 9, 'learning_rate': 0.09}. Best is trial 28 with value: 6.3.
[I 2023-08-24 10:33:24,694] Trial 49 finished with value: 6.38 and parameters: {'iterations': 174, 'depth': 5, 'learning_rate': 0.05}. Best is trial 28 with value: 6.3.
Лучшие параметры: {'iterations': 239, 'depth': 7, 'learning_rate': 0.08}
Лучшее значение метрики: 6.3
| iterations | depth | learning_rate | score | |
|---|---|---|---|---|
| 0 | 239.0 | 7.0 | 0.08 | 6.3 |
| 1 | 212.0 | 7.0 | 0.06 | 6.3 |
| 2 | 221.0 | 7.0 | 0.08 | 6.3 |
<timed exec>:49: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:60: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 5min 8s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["iterations", "depth"])
fig.show()
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["iterations", "learning_rate"])
fig.show()
Сужаем диапозон параметров для подбора.
%%time
def objective(trial):
# Определяем параметры для подбора
iterations = trial.suggest_int('iterations', 250, 600)
depth = trial.suggest_int('depth', 2, 8)
learning_rate = trial.suggest_float('learning_rate', 0.01, 0.08, step=0.01)
# Создаем модель с определенными параметрами
model = CatBoostRegressor(iterations=iterations,
depth=depth,
learning_rate=learning_rate,
random_state=RANDOM_STATE,
silent=True)
# Оцениваем качество модели с помощью кросс-валидации
scores = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращаем среднее значение метрики
return abs(scores.mean())
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаем датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'iterations': [],
'depth': [],
'learning_rate': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаем график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаем график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:33:26,960] A new study created in memory with name: no-name-b3be6fe5-b320-4c8d-af84-dc7eed094f89
[I 2023-08-24 10:33:31,650] Trial 0 finished with value: 6.3 and parameters: {'iterations': 500, 'depth': 6, 'learning_rate': 0.04}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:32,390] Trial 1 finished with value: 6.36 and parameters: {'iterations': 469, 'depth': 2, 'learning_rate': 0.060000000000000005}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:35,262] Trial 2 finished with value: 6.53 and parameters: {'iterations': 517, 'depth': 5, 'learning_rate': 0.01}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:37,968] Trial 3 finished with value: 6.34 and parameters: {'iterations': 306, 'depth': 6, 'learning_rate': 0.06999999999999999}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:45,167] Trial 4 finished with value: 6.31 and parameters: {'iterations': 475, 'depth': 7, 'learning_rate': 0.03}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:48,370] Trial 5 finished with value: 6.41 and parameters: {'iterations': 595, 'depth': 5, 'learning_rate': 0.08}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:49,265] Trial 6 finished with value: 6.42 and parameters: {'iterations': 596, 'depth': 2, 'learning_rate': 0.03}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:57,470] Trial 7 finished with value: 6.35 and parameters: {'iterations': 488, 'depth': 7, 'learning_rate': 0.060000000000000005}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:33:59,474] Trial 8 finished with value: 6.3 and parameters: {'iterations': 353, 'depth': 5, 'learning_rate': 0.06999999999999999}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:00,392] Trial 9 finished with value: 6.36 and parameters: {'iterations': 570, 'depth': 2, 'learning_rate': 0.060000000000000005}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:10,864] Trial 10 finished with value: 6.32 and parameters: {'iterations': 392, 'depth': 8, 'learning_rate': 0.04}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:12,171] Trial 11 finished with value: 6.32 and parameters: {'iterations': 364, 'depth': 4, 'learning_rate': 0.04}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:13,256] Trial 12 finished with value: 6.3 and parameters: {'iterations': 290, 'depth': 4, 'learning_rate': 0.08}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:16,508] Trial 13 finished with value: 6.59 and parameters: {'iterations': 367, 'depth': 6, 'learning_rate': 0.01}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:17,567] Trial 14 finished with value: 6.34 and parameters: {'iterations': 254, 'depth': 4, 'learning_rate': 0.05}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:21,229] Trial 15 finished with value: 6.3 and parameters: {'iterations': 407, 'depth': 6, 'learning_rate': 0.03}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:22,288] Trial 16 finished with value: 6.35 and parameters: {'iterations': 440, 'depth': 3, 'learning_rate': 0.05}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:37,140] Trial 17 finished with value: 6.47 and parameters: {'iterations': 533, 'depth': 8, 'learning_rate': 0.06999999999999999}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:42,324] Trial 18 finished with value: 6.39 and parameters: {'iterations': 340, 'depth': 7, 'learning_rate': 0.02}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:44,660] Trial 19 finished with value: 6.32 and parameters: {'iterations': 437, 'depth': 5, 'learning_rate': 0.06999999999999999}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:49,066] Trial 20 finished with value: 6.35 and parameters: {'iterations': 530, 'depth': 6, 'learning_rate': 0.05}. Best is trial 0 with value: 6.3.
[I 2023-08-24 10:34:50,113] Trial 21 finished with value: 6.29 and parameters: {'iterations': 278, 'depth': 4, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:34:50,875] Trial 22 finished with value: 6.36 and parameters: {'iterations': 305, 'depth': 3, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:34:51,830] Trial 23 finished with value: 6.3 and parameters: {'iterations': 252, 'depth': 4, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:34:52,620] Trial 24 finished with value: 6.37 and parameters: {'iterations': 325, 'depth': 3, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:34:54,179] Trial 25 finished with value: 6.32 and parameters: {'iterations': 279, 'depth': 5, 'learning_rate': 0.04}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:34:56,086] Trial 26 finished with value: 6.29 and parameters: {'iterations': 346, 'depth': 5, 'learning_rate': 0.060000000000000005}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:34:59,662] Trial 27 finished with value: 6.33 and parameters: {'iterations': 392, 'depth': 6, 'learning_rate': 0.060000000000000005}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:00,668] Trial 28 finished with value: 6.34 and parameters: {'iterations': 275, 'depth': 4, 'learning_rate': 0.05}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:05,587] Trial 29 finished with value: 6.33 and parameters: {'iterations': 320, 'depth': 7, 'learning_rate': 0.060000000000000005}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:06,812] Trial 30 finished with value: 6.48 and parameters: {'iterations': 452, 'depth': 3, 'learning_rate': 0.02}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:08,757] Trial 31 finished with value: 6.3 and parameters: {'iterations': 351, 'depth': 5, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:11,009] Trial 32 finished with value: 6.32 and parameters: {'iterations': 410, 'depth': 5, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:13,064] Trial 33 finished with value: 6.34 and parameters: {'iterations': 379, 'depth': 5, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:17,532] Trial 34 finished with value: 6.35 and parameters: {'iterations': 503, 'depth': 6, 'learning_rate': 0.060000000000000005}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:18,730] Trial 35 finished with value: 6.32 and parameters: {'iterations': 336, 'depth': 4, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:20,414] Trial 36 finished with value: 6.32 and parameters: {'iterations': 304, 'depth': 5, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:24,741] Trial 37 finished with value: 6.35 and parameters: {'iterations': 478, 'depth': 6, 'learning_rate': 0.060000000000000005}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:26,734] Trial 38 finished with value: 6.32 and parameters: {'iterations': 560, 'depth': 4, 'learning_rate': 0.04}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:29,274] Trial 39 finished with value: 6.32 and parameters: {'iterations': 459, 'depth': 5, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:35,687] Trial 40 finished with value: 6.3 and parameters: {'iterations': 422, 'depth': 7, 'learning_rate': 0.03}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:36,710] Trial 41 finished with value: 6.3 and parameters: {'iterations': 286, 'depth': 4, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:37,792] Trial 42 finished with value: 6.3 and parameters: {'iterations': 294, 'depth': 4, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:39,573] Trial 43 finished with value: 6.32 and parameters: {'iterations': 318, 'depth': 5, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:40,260] Trial 44 finished with value: 6.39 and parameters: {'iterations': 268, 'depth': 3, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:41,542] Trial 45 finished with value: 6.32 and parameters: {'iterations': 351, 'depth': 4, 'learning_rate': 0.08}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:43,948] Trial 46 finished with value: 6.32 and parameters: {'iterations': 268, 'depth': 6, 'learning_rate': 0.060000000000000005}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:44,546] Trial 47 finished with value: 6.38 and parameters: {'iterations': 367, 'depth': 2, 'learning_rate': 0.05}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:46,237] Trial 48 finished with value: 6.29 and parameters: {'iterations': 305, 'depth': 5, 'learning_rate': 0.06999999999999999}. Best is trial 21 with value: 6.29.
[I 2023-08-24 10:35:49,188] Trial 49 finished with value: 6.29 and parameters: {'iterations': 329, 'depth': 6, 'learning_rate': 0.04}. Best is trial 21 with value: 6.29.
Лучшие параметры: {'iterations': 278, 'depth': 4, 'learning_rate': 0.08}
Лучшее значение метрики: 6.29
| iterations | depth | learning_rate | score | |
|---|---|---|---|---|
| 0 | 278.0 | 4.0 | 0.08 | 6.29 |
| 1 | 346.0 | 5.0 | 0.06 | 6.29 |
| 2 | 305.0 | 5.0 | 0.07 | 6.29 |
| 3 | 329.0 | 6.0 | 0.04 | 6.29 |
<timed exec>:48: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:64: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 2min 24s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["iterations", "depth"])
fig.show()
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["iterations", "learning_rate"])
fig.show()
Редактируем диапазоны параметров и делаем окончательный подбор.
%%time
def objective(trial):
# Определяем параметры для подбора
iterations = trial.suggest_int('iterations', 300, 500)
depth = trial.suggest_int('depth', 5, 7)
learning_rate = trial.suggest_float('learning_rate', 0.02, 0.06, step=0.01)
# Создаем модель с определенными параметрами
model = CatBoostRegressor(iterations=iterations,
depth=depth,
learning_rate=learning_rate,
random_state=RANDOM_STATE,
silent=True)
# Оцениваем качество модели с помощью кросс-валидации
scores = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращаем среднее значение метрики
return abs(scores.mean())
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=200)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаем датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame({'iterations': [],
'depth': [],
'learning_rate': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаем график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Важность гиперпараметров',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Гиперпараметры',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Важность гиперпараметров',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаем график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Итерации',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Целевая метрика',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('История оптимизации \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
# Сохранение модели с наилучшими параметрами
catboost = CatBoostRegressor(iterations=study.best_params['iterations'],
depth=study.best_params['depth'],
learning_rate=study.best_params['learning_rate'],
random_state=RANDOM_STATE,
silent=True)
[I 2023-08-24 10:35:51,483] A new study created in memory with name: no-name-b1ca16d1-9e93-4d5a-bbe9-39ec2bed48d2
[I 2023-08-24 10:35:54,756] Trial 0 finished with value: 6.33 and parameters: {'iterations': 333, 'depth': 6, 'learning_rate': 0.06}. Best is trial 0 with value: 6.33.
[I 2023-08-24 10:35:58,415] Trial 1 finished with value: 6.35 and parameters: {'iterations': 416, 'depth': 6, 'learning_rate': 0.02}. Best is trial 0 with value: 6.33.
[I 2023-08-24 10:36:00,402] Trial 2 finished with value: 6.32 and parameters: {'iterations': 361, 'depth': 5, 'learning_rate': 0.03}. Best is trial 2 with value: 6.32.
[I 2023-08-24 10:36:04,443] Trial 3 finished with value: 6.29 and parameters: {'iterations': 461, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:11,497] Trial 4 finished with value: 6.3 and parameters: {'iterations': 484, 'depth': 7, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:13,929] Trial 5 finished with value: 6.3 and parameters: {'iterations': 446, 'depth': 5, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:15,953] Trial 6 finished with value: 6.3 and parameters: {'iterations': 374, 'depth': 5, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:19,679] Trial 7 finished with value: 6.34 and parameters: {'iterations': 422, 'depth': 6, 'learning_rate': 0.02}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:26,161] Trial 8 finished with value: 6.36 and parameters: {'iterations': 383, 'depth': 7, 'learning_rate': 0.02}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:28,305] Trial 9 finished with value: 6.33 and parameters: {'iterations': 344, 'depth': 5, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:35,955] Trial 10 finished with value: 6.35 and parameters: {'iterations': 497, 'depth': 7, 'learning_rate': 0.06}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:44,235] Trial 11 finished with value: 6.33 and parameters: {'iterations': 491, 'depth': 7, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:52,194] Trial 12 finished with value: 6.31 and parameters: {'iterations': 467, 'depth': 7, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:36:56,507] Trial 13 finished with value: 6.34 and parameters: {'iterations': 457, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:00,700] Trial 14 finished with value: 6.29 and parameters: {'iterations': 437, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:04,969] Trial 15 finished with value: 6.34 and parameters: {'iterations': 436, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:08,912] Trial 16 finished with value: 6.29 and parameters: {'iterations': 407, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:11,718] Trial 17 finished with value: 6.33 and parameters: {'iterations': 301, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:14,232] Trial 18 finished with value: 6.3 and parameters: {'iterations': 462, 'depth': 5, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:18,474] Trial 19 finished with value: 6.29 and parameters: {'iterations': 434, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:22,409] Trial 20 finished with value: 6.34 and parameters: {'iterations': 395, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:26,147] Trial 21 finished with value: 6.29 and parameters: {'iterations': 413, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:29,724] Trial 22 finished with value: 6.29 and parameters: {'iterations': 401, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:34,285] Trial 23 finished with value: 6.29 and parameters: {'iterations': 476, 'depth': 6, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:38,611] Trial 24 finished with value: 6.29 and parameters: {'iterations': 441, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:43,183] Trial 25 finished with value: 6.29 and parameters: {'iterations': 453, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:50,202] Trial 26 finished with value: 6.32 and parameters: {'iterations': 427, 'depth': 7, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:52,414] Trial 27 finished with value: 6.3 and parameters: {'iterations': 404, 'depth': 5, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:55,992] Trial 28 finished with value: 6.33 and parameters: {'iterations': 388, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:37:58,859] Trial 29 finished with value: 6.33 and parameters: {'iterations': 473, 'depth': 5, 'learning_rate': 0.06}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:05,156] Trial 30 finished with value: 6.32 and parameters: {'iterations': 366, 'depth': 7, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:09,025] Trial 31 finished with value: 6.29 and parameters: {'iterations': 430, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:12,924] Trial 32 finished with value: 6.29 and parameters: {'iterations': 410, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:16,942] Trial 33 finished with value: 6.29 and parameters: {'iterations': 443, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:20,839] Trial 34 finished with value: 6.29 and parameters: {'iterations': 422, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:25,479] Trial 35 finished with value: 6.29 and parameters: {'iterations': 448, 'depth': 6, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:30,266] Trial 36 finished with value: 6.3 and parameters: {'iterations': 484, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:34,086] Trial 37 finished with value: 6.34 and parameters: {'iterations': 434, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:36,373] Trial 38 finished with value: 6.29 and parameters: {'iterations': 416, 'depth': 5, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:39,487] Trial 39 finished with value: 6.31 and parameters: {'iterations': 348, 'depth': 6, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:42,831] Trial 40 finished with value: 6.29 and parameters: {'iterations': 379, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:46,472] Trial 41 finished with value: 6.29 and parameters: {'iterations': 412, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:50,014] Trial 42 finished with value: 6.29 and parameters: {'iterations': 394, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:54,074] Trial 43 finished with value: 6.29 and parameters: {'iterations': 419, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:38:58,514] Trial 44 finished with value: 6.29 and parameters: {'iterations': 456, 'depth': 6, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:04,745] Trial 45 finished with value: 6.29 and parameters: {'iterations': 407, 'depth': 7, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:08,526] Trial 46 finished with value: 6.34 and parameters: {'iterations': 428, 'depth': 6, 'learning_rate': 0.02}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:12,645] Trial 47 finished with value: 6.35 and parameters: {'iterations': 467, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:16,650] Trial 48 finished with value: 6.34 and parameters: {'iterations': 449, 'depth': 6, 'learning_rate': 0.05}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:18,460] Trial 49 finished with value: 6.36 and parameters: {'iterations': 306, 'depth': 5, 'learning_rate': 0.03}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:25,121] Trial 50 finished with value: 6.3 and parameters: {'iterations': 439, 'depth': 7, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:28,628] Trial 51 finished with value: 6.29 and parameters: {'iterations': 399, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:32,108] Trial 52 finished with value: 6.29 and parameters: {'iterations': 388, 'depth': 6, 'learning_rate': 0.04}. Best is trial 3 with value: 6.29.
[I 2023-08-24 10:39:35,413] Trial 53 finished with value: 6.28 and parameters: {'iterations': 373, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:38,617] Trial 54 finished with value: 6.28 and parameters: {'iterations': 368, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:41,797] Trial 55 finished with value: 6.28 and parameters: {'iterations': 357, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:44,984] Trial 56 finished with value: 6.28 and parameters: {'iterations': 361, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:48,126] Trial 57 finished with value: 6.31 and parameters: {'iterations': 353, 'depth': 6, 'learning_rate': 0.03}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:51,388] Trial 58 finished with value: 6.33 and parameters: {'iterations': 367, 'depth': 6, 'learning_rate': 0.05}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:54,309] Trial 59 finished with value: 6.28 and parameters: {'iterations': 331, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:39:57,416] Trial 60 finished with value: 6.28 and parameters: {'iterations': 336, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:00,459] Trial 61 finished with value: 6.29 and parameters: {'iterations': 329, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:03,408] Trial 62 finished with value: 6.29 and parameters: {'iterations': 327, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:06,385] Trial 63 finished with value: 6.29 and parameters: {'iterations': 338, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:09,209] Trial 64 finished with value: 6.29 and parameters: {'iterations': 319, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:12,366] Trial 65 finished with value: 6.28 and parameters: {'iterations': 357, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:15,557] Trial 66 finished with value: 6.28 and parameters: {'iterations': 356, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:18,562] Trial 67 finished with value: 6.29 and parameters: {'iterations': 342, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:21,898] Trial 68 finished with value: 6.3 and parameters: {'iterations': 373, 'depth': 6, 'learning_rate': 0.03}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:25,151] Trial 69 finished with value: 6.28 and parameters: {'iterations': 361, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:28,265] Trial 70 finished with value: 6.32 and parameters: {'iterations': 350, 'depth': 6, 'learning_rate': 0.05}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:31,410] Trial 71 finished with value: 6.28 and parameters: {'iterations': 356, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:34,445] Trial 72 finished with value: 6.28 and parameters: {'iterations': 360, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:37,751] Trial 73 finished with value: 6.28 and parameters: {'iterations': 373, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:40,755] Trial 74 finished with value: 6.28 and parameters: {'iterations': 337, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:43,613] Trial 75 finished with value: 6.29 and parameters: {'iterations': 322, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:46,871] Trial 76 finished with value: 6.28 and parameters: {'iterations': 367, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:49,977] Trial 77 finished with value: 6.29 and parameters: {'iterations': 344, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:53,384] Trial 78 finished with value: 6.29 and parameters: {'iterations': 381, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:56,554] Trial 79 finished with value: 6.28 and parameters: {'iterations': 357, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:40:59,361] Trial 80 finished with value: 6.33 and parameters: {'iterations': 313, 'depth': 6, 'learning_rate': 0.03}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:02,500] Trial 81 finished with value: 6.28 and parameters: {'iterations': 363, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:05,639] Trial 82 finished with value: 6.28 and parameters: {'iterations': 350, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:08,618] Trial 83 finished with value: 6.28 and parameters: {'iterations': 333, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:11,929] Trial 84 finished with value: 6.33 and parameters: {'iterations': 371, 'depth': 6, 'learning_rate': 0.05}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:15,299] Trial 85 finished with value: 6.29 and parameters: {'iterations': 378, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:18,343] Trial 86 finished with value: 6.29 and parameters: {'iterations': 345, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:21,571] Trial 87 finished with value: 6.28 and parameters: {'iterations': 362, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:25,704] Trial 88 finished with value: 6.29 and parameters: {'iterations': 387, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:29,110] Trial 89 finished with value: 6.28 and parameters: {'iterations': 356, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:32,145] Trial 90 finished with value: 6.28 and parameters: {'iterations': 336, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:35,221] Trial 91 finished with value: 6.28 and parameters: {'iterations': 358, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:38,353] Trial 92 finished with value: 6.28 and parameters: {'iterations': 352, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:41,663] Trial 93 finished with value: 6.28 and parameters: {'iterations': 369, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:45,113] Trial 94 finished with value: 6.29 and parameters: {'iterations': 377, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:48,159] Trial 95 finished with value: 6.29 and parameters: {'iterations': 342, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:51,401] Trial 96 finished with value: 6.28 and parameters: {'iterations': 364, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:54,537] Trial 97 finished with value: 6.28 and parameters: {'iterations': 355, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:41:57,495] Trial 98 finished with value: 6.29 and parameters: {'iterations': 330, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:00,559] Trial 99 finished with value: 6.29 and parameters: {'iterations': 348, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:04,097] Trial 100 finished with value: 6.32 and parameters: {'iterations': 384, 'depth': 6, 'learning_rate': 0.06}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:07,293] Trial 101 finished with value: 6.28 and parameters: {'iterations': 359, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:10,468] Trial 102 finished with value: 6.28 and parameters: {'iterations': 362, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:13,574] Trial 103 finished with value: 6.29 and parameters: {'iterations': 347, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:16,586] Trial 104 finished with value: 6.29 and parameters: {'iterations': 340, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:19,881] Trial 105 finished with value: 6.28 and parameters: {'iterations': 360, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:23,199] Trial 106 finished with value: 6.29 and parameters: {'iterations': 376, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:26,147] Trial 107 finished with value: 6.29 and parameters: {'iterations': 324, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:29,451] Trial 108 finished with value: 6.33 and parameters: {'iterations': 369, 'depth': 6, 'learning_rate': 0.05}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:32,563] Trial 109 finished with value: 6.28 and parameters: {'iterations': 353, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:35,556] Trial 110 finished with value: 6.32 and parameters: {'iterations': 334, 'depth': 6, 'learning_rate': 0.03}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:38,816] Trial 111 finished with value: 6.28 and parameters: {'iterations': 371, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:42,128] Trial 112 finished with value: 6.28 and parameters: {'iterations': 374, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:45,605] Trial 113 finished with value: 6.29 and parameters: {'iterations': 393, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:48,837] Trial 114 finished with value: 6.28 and parameters: {'iterations': 367, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:51,681] Trial 115 finished with value: 6.29 and parameters: {'iterations': 317, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:55,241] Trial 116 finished with value: 6.28 and parameters: {'iterations': 365, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:42:58,914] Trial 117 finished with value: 6.28 and parameters: {'iterations': 355, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:02,525] Trial 118 finished with value: 6.28 and parameters: {'iterations': 360, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:06,197] Trial 119 finished with value: 6.29 and parameters: {'iterations': 382, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:09,573] Trial 120 finished with value: 6.28 and parameters: {'iterations': 350, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:12,805] Trial 121 finished with value: 6.29 and parameters: {'iterations': 338, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:16,176] Trial 122 finished with value: 6.29 and parameters: {'iterations': 344, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:19,546] Trial 123 finished with value: 6.28 and parameters: {'iterations': 333, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:23,454] Trial 124 finished with value: 6.28 and parameters: {'iterations': 373, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:26,897] Trial 125 finished with value: 6.28 and parameters: {'iterations': 356, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:30,169] Trial 126 finished with value: 6.28 and parameters: {'iterations': 363, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:33,624] Trial 127 finished with value: 6.29 and parameters: {'iterations': 327, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:37,029] Trial 128 finished with value: 6.29 and parameters: {'iterations': 347, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:40,266] Trial 129 finished with value: 6.29 and parameters: {'iterations': 310, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:43,539] Trial 130 finished with value: 6.28 and parameters: {'iterations': 352, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:46,845] Trial 131 finished with value: 6.28 and parameters: {'iterations': 366, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:50,177] Trial 132 finished with value: 6.28 and parameters: {'iterations': 370, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:53,715] Trial 133 finished with value: 6.28 and parameters: {'iterations': 359, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:43:57,585] Trial 134 finished with value: 6.29 and parameters: {'iterations': 377, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:00,899] Trial 135 finished with value: 6.28 and parameters: {'iterations': 366, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:04,117] Trial 136 finished with value: 6.28 and parameters: {'iterations': 361, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:07,177] Trial 137 finished with value: 6.29 and parameters: {'iterations': 343, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:10,365] Trial 138 finished with value: 6.28 and parameters: {'iterations': 357, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:13,459] Trial 139 finished with value: 6.29 and parameters: {'iterations': 339, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:16,610] Trial 140 finished with value: 6.28 and parameters: {'iterations': 351, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:19,641] Trial 141 finished with value: 6.28 and parameters: {'iterations': 355, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:22,913] Trial 142 finished with value: 6.28 and parameters: {'iterations': 369, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:26,136] Trial 143 finished with value: 6.28 and parameters: {'iterations': 364, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:29,540] Trial 144 finished with value: 6.28 and parameters: {'iterations': 374, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:32,725] Trial 145 finished with value: 6.28 and parameters: {'iterations': 361, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:35,815] Trial 146 finished with value: 6.29 and parameters: {'iterations': 346, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:39,050] Trial 147 finished with value: 6.29 and parameters: {'iterations': 384, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:42,397] Trial 148 finished with value: 6.28 and parameters: {'iterations': 356, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:45,898] Trial 149 finished with value: 6.29 and parameters: {'iterations': 379, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:48,823] Trial 150 finished with value: 6.28 and parameters: {'iterations': 332, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:52,176] Trial 151 finished with value: 6.28 and parameters: {'iterations': 366, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:55,453] Trial 152 finished with value: 6.28 and parameters: {'iterations': 371, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:44:58,633] Trial 153 finished with value: 6.28 and parameters: {'iterations': 358, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:01,750] Trial 154 finished with value: 6.28 and parameters: {'iterations': 351, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:05,015] Trial 155 finished with value: 6.28 and parameters: {'iterations': 364, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:07,901] Trial 156 finished with value: 6.29 and parameters: {'iterations': 324, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:11,110] Trial 157 finished with value: 6.28 and parameters: {'iterations': 361, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:14,136] Trial 158 finished with value: 6.28 and parameters: {'iterations': 336, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:17,416] Trial 159 finished with value: 6.28 and parameters: {'iterations': 368, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:20,550] Trial 160 finished with value: 6.28 and parameters: {'iterations': 354, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:23,633] Trial 161 finished with value: 6.28 and parameters: {'iterations': 350, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:26,695] Trial 162 finished with value: 6.29 and parameters: {'iterations': 348, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:29,799] Trial 163 finished with value: 6.29 and parameters: {'iterations': 341, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:32,966] Trial 164 finished with value: 6.28 and parameters: {'iterations': 359, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:36,181] Trial 165 finished with value: 6.28 and parameters: {'iterations': 363, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:39,349] Trial 166 finished with value: 6.28 and parameters: {'iterations': 355, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:43,760] Trial 167 finished with value: 6.3 and parameters: {'iterations': 500, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:46,810] Trial 168 finished with value: 6.41 and parameters: {'iterations': 346, 'depth': 6, 'learning_rate': 0.02}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:49,983] Trial 169 finished with value: 6.28 and parameters: {'iterations': 353, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:53,267] Trial 170 finished with value: 6.28 and parameters: {'iterations': 374, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:56,193] Trial 171 finished with value: 6.29 and parameters: {'iterations': 329, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:45:59,190] Trial 172 finished with value: 6.28 and parameters: {'iterations': 335, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:02,068] Trial 173 finished with value: 6.29 and parameters: {'iterations': 319, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:05,070] Trial 174 finished with value: 6.29 and parameters: {'iterations': 341, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:08,312] Trial 175 finished with value: 6.28 and parameters: {'iterations': 362, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:11,575] Trial 176 finished with value: 6.28 and parameters: {'iterations': 368, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:14,527] Trial 177 finished with value: 6.28 and parameters: {'iterations': 332, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:17,694] Trial 178 finished with value: 6.28 and parameters: {'iterations': 357, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:20,779] Trial 179 finished with value: 6.28 and parameters: {'iterations': 350, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:24,047] Trial 180 finished with value: 6.29 and parameters: {'iterations': 327, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:27,783] Trial 181 finished with value: 6.28 and parameters: {'iterations': 362, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:31,081] Trial 182 finished with value: 6.28 and parameters: {'iterations': 364, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:34,286] Trial 183 finished with value: 6.28 and parameters: {'iterations': 358, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:37,613] Trial 184 finished with value: 6.28 and parameters: {'iterations': 372, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:40,886] Trial 185 finished with value: 6.28 and parameters: {'iterations': 367, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:44,170] Trial 186 finished with value: 6.28 and parameters: {'iterations': 359, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:47,209] Trial 187 finished with value: 6.29 and parameters: {'iterations': 338, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:50,331] Trial 188 finished with value: 6.28 and parameters: {'iterations': 353, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:53,650] Trial 189 finished with value: 6.28 and parameters: {'iterations': 370, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:56,726] Trial 190 finished with value: 6.29 and parameters: {'iterations': 348, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:46:59,921] Trial 191 finished with value: 6.28 and parameters: {'iterations': 357, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:03,245] Trial 192 finished with value: 6.28 and parameters: {'iterations': 363, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:06,527] Trial 193 finished with value: 6.28 and parameters: {'iterations': 354, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:09,432] Trial 194 finished with value: 6.29 and parameters: {'iterations': 344, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:12,609] Trial 195 finished with value: 6.28 and parameters: {'iterations': 360, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:15,869] Trial 196 finished with value: 6.28 and parameters: {'iterations': 364, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:19,018] Trial 197 finished with value: 6.28 and parameters: {'iterations': 356, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:22,142] Trial 198 finished with value: 6.28 and parameters: {'iterations': 351, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
[I 2023-08-24 10:47:25,494] Trial 199 finished with value: 6.29 and parameters: {'iterations': 377, 'depth': 6, 'learning_rate': 0.04}. Best is trial 53 with value: 6.28.
Лучшие параметры: {'iterations': 373, 'depth': 6, 'learning_rate': 0.04}
Лучшее значение метрики: 6.28
| iterations | depth | learning_rate | score | |
|---|---|---|---|---|
| 0 | 373.0 | 6.0 | 0.04 | 6.28 |
| 58 | 358.0 | 6.0 | 0.04 | 6.28 |
| 67 | 363.0 | 6.0 | 0.04 | 6.28 |
| 66 | 359.0 | 6.0 | 0.04 | 6.28 |
| 65 | 350.0 | 6.0 | 0.04 | 6.28 |
| 64 | 354.0 | 6.0 | 0.04 | 6.28 |
| 63 | 368.0 | 6.0 | 0.04 | 6.28 |
| 62 | 336.0 | 6.0 | 0.04 | 6.28 |
| 61 | 361.0 | 6.0 | 0.04 | 6.28 |
| 60 | 364.0 | 6.0 | 0.04 | 6.28 |
<timed exec>:49: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:65: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 11min 36s
catboost.get_params()
{'iterations': 373,
'learning_rate': 0.04,
'depth': 6,
'loss_function': 'RMSE',
'silent': True,
'random_state': 140823}
%%time
def objective(trial):
# Определяем параметры для подбора
C = trial.suggest_float('C', 0.01, 1000, step=0.01)
epsilon = trial.suggest_float('epsilon', 0.01, 10, step=0.01)
kernel = trial.suggest_categorical('kernel', ['linear', 'rbf', 'poly'])
degree = trial.suggest_int('degree', 2, 5)
# Создаем модель SVR с определенными параметрами
model = SVR(C=C,
epsilon=epsilon,
kernel=kernel,
degree=degree,
max_iter=-1)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращает значение метрики
return score.mean()
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаём датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame(
{'C': [],
'epsilon': [],
'kernel': [],
'degree': [],
'score': []}
)
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаём график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Importance',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Hyperparameters',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Hyperparameters Importance',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаём график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Iterations',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Objective Value',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Optimization History \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
[I 2023-08-24 10:47:28,072] A new study created in memory with name: no-name-b7da7403-18ec-4f79-9549-d4776d15c34a
[I 2023-08-24 10:47:28,517] Trial 0 finished with value: 7.19 and parameters: {'C': 390.02, 'epsilon': 7.71, 'kernel': 'rbf', 'degree': 4}. Best is trial 0 with value: 7.19.
[I 2023-08-24 10:47:51,516] Trial 1 finished with value: 6.55 and parameters: {'C': 772.88, 'epsilon': 5.29, 'kernel': 'linear', 'degree': 5}. Best is trial 1 with value: 6.55.
[I 2023-08-24 10:48:18,053] Trial 2 finished with value: 6.53 and parameters: {'C': 816.83, 'epsilon': 3.3499999999999996, 'kernel': 'linear', 'degree': 3}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:18,471] Trial 3 finished with value: 45.6 and parameters: {'C': 240.6, 'epsilon': 2.36, 'kernel': 'poly', 'degree': 5}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:20,773] Trial 4 finished with value: 7.68 and parameters: {'C': 703.71, 'epsilon': 0.29000000000000004, 'kernel': 'rbf', 'degree': 3}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:21,449] Trial 5 finished with value: 21.55 and parameters: {'C': 201.5, 'epsilon': 1.37, 'kernel': 'poly', 'degree': 4}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:21,616] Trial 6 finished with value: 6.94 and parameters: {'C': 101.4, 'epsilon': 8.88, 'kernel': 'rbf', 'degree': 5}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:22,078] Trial 7 finished with value: 8.96 and parameters: {'C': 43.35, 'epsilon': 0.72, 'kernel': 'poly', 'degree': 3}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:49,417] Trial 8 finished with value: 6.53 and parameters: {'C': 712.3100000000001, 'epsilon': 3.3499999999999996, 'kernel': 'linear', 'degree': 3}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:48:49,741] Trial 9 finished with value: 10.31 and parameters: {'C': 184.16, 'epsilon': 7.71, 'kernel': 'poly', 'degree': 3}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:49:17,413] Trial 10 finished with value: 6.54 and parameters: {'C': 983.48, 'epsilon': 4.6499999999999995, 'kernel': 'linear', 'degree': 2}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:49:45,131] Trial 11 finished with value: 6.53 and parameters: {'C': 657.18, 'epsilon': 3.86, 'kernel': 'linear', 'degree': 2}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:50:16,884] Trial 12 finished with value: 6.53 and parameters: {'C': 929.58, 'epsilon': 3.4, 'kernel': 'linear', 'degree': 3}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:50:30,712] Trial 13 finished with value: 6.57 and parameters: {'C': 556.52, 'epsilon': 6.11, 'kernel': 'linear', 'degree': 2}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:51:15,231] Trial 14 finished with value: 6.53 and parameters: {'C': 837.3100000000001, 'epsilon': 2.3, 'kernel': 'linear', 'degree': 4}. Best is trial 2 with value: 6.53.
[I 2023-08-24 10:51:44,450] Trial 15 finished with value: 6.52 and parameters: {'C': 477.55, 'epsilon': 3.15, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:51:58,228] Trial 16 finished with value: 6.57 and parameters: {'C': 403.87, 'epsilon': 6.12, 'kernel': 'linear', 'degree': 4}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:52:16,944] Trial 17 finished with value: 6.53 and parameters: {'C': 491.12, 'epsilon': 1.69, 'kernel': 'linear', 'degree': 2}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:52:41,681] Trial 18 finished with value: 6.53 and parameters: {'C': 576.21, 'epsilon': 4.43, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:52:42,585] Trial 19 finished with value: 7.15 and parameters: {'C': 326.6, 'epsilon': 2.59, 'kernel': 'rbf', 'degree': 4}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:53:08,655] Trial 20 finished with value: 6.56 and parameters: {'C': 857.09, 'epsilon': 5.9399999999999995, 'kernel': 'linear', 'degree': 2}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:53:31,921] Trial 21 finished with value: 6.53 and parameters: {'C': 669.09, 'epsilon': 3.1599999999999997, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:53:56,053] Trial 22 finished with value: 6.53 and parameters: {'C': 764.3100000000001, 'epsilon': 4.0, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:54:19,528] Trial 23 finished with value: 6.52 and parameters: {'C': 511.2, 'epsilon': 3.0999999999999996, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:54:42,681] Trial 24 finished with value: 6.53 and parameters: {'C': 483.12, 'epsilon': 1.53, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:55:02,709] Trial 25 finished with value: 6.54 and parameters: {'C': 591.68, 'epsilon': 5.09, 'kernel': 'linear', 'degree': 3}. Best is trial 15 with value: 6.52.
[I 2023-08-24 10:55:14,635] Trial 26 finished with value: 6.51 and parameters: {'C': 315.3, 'epsilon': 2.7199999999999998, 'kernel': 'linear', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:55:28,045] Trial 27 finished with value: 6.52 and parameters: {'C': 327.37, 'epsilon': 2.6199999999999997, 'kernel': 'linear', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:55:28,935] Trial 28 finished with value: 27.42 and parameters: {'C': 426.01, 'epsilon': 1.06, 'kernel': 'poly', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:55:29,143] Trial 29 finished with value: 7.24 and parameters: {'C': 282.25, 'epsilon': 9.99, 'kernel': 'rbf', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:55:30,804] Trial 30 finished with value: 7.53 and parameters: {'C': 449.3, 'epsilon': 0.08, 'kernel': 'rbf', 'degree': 5}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:55:42,558] Trial 31 finished with value: 6.51 and parameters: {'C': 336.64, 'epsilon': 2.65, 'kernel': 'linear', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:55:58,660] Trial 32 finished with value: 6.53 and parameters: {'C': 353.79, 'epsilon': 1.8900000000000001, 'kernel': 'linear', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:56:15,293] Trial 33 finished with value: 6.51 and parameters: {'C': 379.11, 'epsilon': 2.8899999999999997, 'kernel': 'linear', 'degree': 4}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:56:20,021] Trial 34 finished with value: 6.51 and parameters: {'C': 142.16, 'epsilon': 4.01, 'kernel': 'linear', 'degree': 5}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:56:27,674] Trial 35 finished with value: 6.51 and parameters: {'C': 123.89000000000001, 'epsilon': 4.01, 'kernel': 'linear', 'degree': 5}. Best is trial 26 with value: 6.51.
[I 2023-08-24 10:56:28,473] Trial 36 finished with value: 6.45 and parameters: {'C': 10.97, 'epsilon': 2.1199999999999997, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:28,780] Trial 37 finished with value: 16.5 and parameters: {'C': 22.490000000000002, 'epsilon': 2.01, 'kernel': 'poly', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:42,351] Trial 38 finished with value: 6.51 and parameters: {'C': 270.34, 'epsilon': 0.8300000000000001, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:42,995] Trial 39 finished with value: 7.01 and parameters: {'C': 200.03, 'epsilon': 2.6999999999999997, 'kernel': 'rbf', 'degree': 4}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:45,868] Trial 40 finished with value: 6.51 and parameters: {'C': 67.12, 'epsilon': 1.21, 'kernel': 'linear', 'degree': 4}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:51,618] Trial 41 finished with value: 6.51 and parameters: {'C': 141.93, 'epsilon': 3.69, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:52,109] Trial 42 finished with value: 6.46 and parameters: {'C': 7.26, 'epsilon': 4.38, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:54,687] Trial 43 finished with value: 6.5 and parameters: {'C': 71.9, 'epsilon': 4.59, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:54,874] Trial 44 finished with value: 11.13 and parameters: {'C': 4.06, 'epsilon': 5.43, 'kernel': 'poly', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:57,240] Trial 45 finished with value: 6.57 and parameters: {'C': 96.99000000000001, 'epsilon': 6.72, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:56:59,541] Trial 46 finished with value: 6.5 and parameters: {'C': 58.67, 'epsilon': 4.64, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:57:02,457] Trial 47 finished with value: 6.5 and parameters: {'C': 61.91, 'epsilon': 4.68, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:57:05,819] Trial 48 finished with value: 6.5 and parameters: {'C': 71.71000000000001, 'epsilon': 4.6499999999999995, 'kernel': 'linear', 'degree': 5}. Best is trial 36 with value: 6.45.
[I 2023-08-24 10:57:06,041] Trial 49 finished with value: 20.33 and parameters: {'C': 45.449999999999996, 'epsilon': 5.49, 'kernel': 'poly', 'degree': 5}. Best is trial 36 with value: 6.45.
Лучшие параметры: {'C': 10.97, 'epsilon': 2.1199999999999997, 'kernel': 'linear', 'degree': 5}
Лучшее значение метрики: 6.45
| C | epsilon | kernel | degree | score | |
|---|---|---|---|---|---|
| 0 | 10.97 | 2.12 | linear | 5.0 | 6.45 |
<timed exec>:54: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:70: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 9min 40s
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["kernel", "epsilon"])
fig.show()
# Создаём график контура
fig = optuna.visualization.plot_contour(study, params=["C", "degree"])
fig.show()
Корректируем диапазоны гиперпараметров.
%%time
def objective(trial):
# Определяем параметры для подбора
C = trial.suggest_float('C', 0.01, 600, step=0.01)
epsilon = trial.suggest_float('epsilon', 1, 8, step=0.1)
kernel = trial.suggest_categorical('kernel', ['linear'])
degree = trial.suggest_int('degree', 3, 4)
# Создаем модель SVR с определенными параметрами
model = SVR(C=C,
epsilon=epsilon,
kernel=kernel,
degree=degree,
max_iter=-1)
# Оцениваем качество модели с помощью кросс-валидации
score = round(abs(cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)).mean(), 2)
# Возвращает значение метрики
return score.mean()
# Создаем объект Study и запускаем оптимизацию
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
print('Лучшие параметры:', study.best_params)
print('Лучшее значение метрики:', study.best_value)
# Создаём датафрейм с значениями гиперпараметров и метрикой
results = pd.DataFrame(
{'C': [],
'epsilon': [],
'kernel': [],
'degree': [],
'score': []})
# Заполняем данными из best_trials
for trial in study.best_trials:
params = trial.params
params['score'] = trial.value
results = results.append(params, ignore_index=True)
# Сортируем датафрейм по убыванию значения метрики
results = results.sort_values(by='score', ascending=False)
# Выводим 10 лучших моделей
display(results.head(10))
# Создаём график важности параметров
optuna.visualization.matplotlib.plot_param_importances(study)
plt.xlabel('Importance',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Hyperparameters',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Hyperparameters Importance',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
# Создаём график истории оптимизации
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.tight_layout()
plt.xlabel('Iterations',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Objective Value',
fontsize=12,
color='DarkSlateGray')
plt.suptitle('Optimization History \n \n',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor', linestyle=':')
plt.grid(True)
plt.show()
# Сохранение модели с наилучшими параметрами
svr = SVR(C=study.best_params['C'],
epsilon=study.best_params['epsilon'],
kernel=study.best_params['kernel'],
degree=study.best_params['degree'])
[I 2023-08-24 10:57:08,292] A new study created in memory with name: no-name-a200c9c2-6db9-4536-9e7d-f570cf7c63cf
[I 2023-08-24 10:57:16,023] Trial 0 finished with value: 6.54 and parameters: {'C': 213.62, 'epsilon': 5.3, 'kernel': 'linear', 'degree': 4}. Best is trial 0 with value: 6.54.
[I 2023-08-24 10:57:29,999] Trial 1 finished with value: 6.58 and parameters: {'C': 408.39, 'epsilon': 7.1000000000000005, 'kernel': 'linear', 'degree': 4}. Best is trial 0 with value: 6.54.
[I 2023-08-24 10:57:33,419] Trial 2 finished with value: 6.57 and parameters: {'C': 143.26999999999998, 'epsilon': 6.5, 'kernel': 'linear', 'degree': 4}. Best is trial 0 with value: 6.54.
[I 2023-08-24 10:57:44,468] Trial 3 finished with value: 6.57 and parameters: {'C': 397.51, 'epsilon': 6.0, 'kernel': 'linear', 'degree': 3}. Best is trial 0 with value: 6.54.
[I 2023-08-24 10:57:45,049] Trial 4 finished with value: 6.48 and parameters: {'C': 16.700000000000003, 'epsilon': 5.6000000000000005, 'kernel': 'linear', 'degree': 3}. Best is trial 4 with value: 6.48.
[I 2023-08-24 10:57:48,349] Trial 5 finished with value: 6.54 and parameters: {'C': 125.23, 'epsilon': 5.9, 'kernel': 'linear', 'degree': 4}. Best is trial 4 with value: 6.48.
[I 2023-08-24 10:57:52,252] Trial 6 finished with value: 6.58 and parameters: {'C': 159.75, 'epsilon': 7.800000000000001, 'kernel': 'linear', 'degree': 3}. Best is trial 4 with value: 6.48.
[I 2023-08-24 10:58:10,214] Trial 7 finished with value: 6.58 and parameters: {'C': 483.48, 'epsilon': 7.4, 'kernel': 'linear', 'degree': 3}. Best is trial 4 with value: 6.48.
[I 2023-08-24 10:58:26,585] Trial 8 finished with value: 6.57 and parameters: {'C': 547.78, 'epsilon': 6.1000000000000005, 'kernel': 'linear', 'degree': 3}. Best is trial 4 with value: 6.48.
[I 2023-08-24 10:58:38,683] Trial 9 finished with value: 6.58 and parameters: {'C': 300.98, 'epsilon': 6.6000000000000005, 'kernel': 'linear', 'degree': 4}. Best is trial 4 with value: 6.48.
[I 2023-08-24 10:58:39,567] Trial 10 finished with value: 6.45 and parameters: {'C': 15.38, 'epsilon': 3.3000000000000003, 'kernel': 'linear', 'degree': 3}. Best is trial 10 with value: 6.45.
[I 2023-08-24 10:58:39,783] Trial 11 finished with value: 6.43 and parameters: {'C': 0.98, 'epsilon': 3.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:58:40,803] Trial 12 finished with value: 6.45 and parameters: {'C': 18.970000000000002, 'epsilon': 2.8, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:58:41,362] Trial 13 finished with value: 6.44 and parameters: {'C': 8.24, 'epsilon': 3.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:58:45,599] Trial 14 finished with value: 6.52 and parameters: {'C': 103.81, 'epsilon': 1.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:58:54,621] Trial 15 finished with value: 6.51 and parameters: {'C': 241.85999999999999, 'epsilon': 3.7, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:58:58,383] Trial 16 finished with value: 6.52 and parameters: {'C': 86.05000000000001, 'epsilon': 1.7000000000000002, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:10,885] Trial 17 finished with value: 6.52 and parameters: {'C': 299.58, 'epsilon': 2.3, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:13,021] Trial 18 finished with value: 6.51 and parameters: {'C': 61.12, 'epsilon': 4.5, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:20,379] Trial 19 finished with value: 6.52 and parameters: {'C': 180.63, 'epsilon': 4.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:31,421] Trial 20 finished with value: 6.52 and parameters: {'C': 295.82, 'epsilon': 2.6, 'kernel': 'linear', 'degree': 4}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:32,049] Trial 21 finished with value: 6.44 and parameters: {'C': 10.86, 'epsilon': 3.5, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:35,214] Trial 22 finished with value: 6.5 and parameters: {'C': 60.769999999999996, 'epsilon': 4.7, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:35,416] Trial 23 finished with value: 6.43 and parameters: {'C': 0.89, 'epsilon': 3.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:38,551] Trial 24 finished with value: 6.5 and parameters: {'C': 73.54, 'epsilon': 2.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:41,629] Trial 25 finished with value: 6.49 and parameters: {'C': 63.86, 'epsilon': 3.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:42,003] Trial 26 finished with value: 6.44 and parameters: {'C': 4.96, 'epsilon': 3.9000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:49,710] Trial 27 finished with value: 6.52 and parameters: {'C': 199.7, 'epsilon': 1.7000000000000002, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 10:59:53,809] Trial 28 finished with value: 6.53 and parameters: {'C': 109.66000000000001, 'epsilon': 5.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:02,569] Trial 29 finished with value: 6.5 and parameters: {'C': 230.49, 'epsilon': 2.8, 'kernel': 'linear', 'degree': 4}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:05,437] Trial 30 finished with value: 6.49 and parameters: {'C': 43.08, 'epsilon': 1.9, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:05,736] Trial 31 finished with value: 6.44 and parameters: {'C': 2.59, 'epsilon': 3.3000000000000003, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:07,722] Trial 32 finished with value: 6.48 and parameters: {'C': 47.32, 'epsilon': 3.7, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:14,287] Trial 33 finished with value: 6.51 and parameters: {'C': 125.65, 'epsilon': 3.1, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:14,472] Trial 34 finished with value: 6.44 and parameters: {'C': 0.38, 'epsilon': 3.9000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:35,825] Trial 35 finished with value: 6.52 and parameters: {'C': 393.56, 'epsilon': 2.3, 'kernel': 'linear', 'degree': 4}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:37,872] Trial 36 finished with value: 6.49 and parameters: {'C': 40.74, 'epsilon': 4.300000000000001, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:44,766] Trial 37 finished with value: 6.52 and parameters: {'C': 151.73999999999998, 'epsilon': 1.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:00:48,756] Trial 38 finished with value: 6.52 and parameters: {'C': 94.69000000000001, 'epsilon': 5.1000000000000005, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:13,048] Trial 39 finished with value: 6.51 and parameters: {'C': 596.4, 'epsilon': 2.8, 'kernel': 'linear', 'degree': 4}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:28,856] Trial 40 finished with value: 6.52 and parameters: {'C': 445.12, 'epsilon': 3.6, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:30,213] Trial 41 finished with value: 6.47 and parameters: {'C': 27.930000000000003, 'epsilon': 4.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:30,592] Trial 42 finished with value: 6.43 and parameters: {'C': 5.5, 'epsilon': 3.5, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:32,373] Trial 43 finished with value: 6.47 and parameters: {'C': 44.9, 'epsilon': 3.5, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:37,871] Trial 44 finished with value: 6.51 and parameters: {'C': 128.34, 'epsilon': 3.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:41,909] Trial 45 finished with value: 6.5 and parameters: {'C': 91.44000000000001, 'epsilon': 2.5, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:42,981] Trial 46 finished with value: 6.46 and parameters: {'C': 23.990000000000002, 'epsilon': 3.4000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:01:56,346] Trial 47 finished with value: 6.53 and parameters: {'C': 358.94, 'epsilon': 4.7, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:00,071] Trial 48 finished with value: 6.49 and parameters: {'C': 76.41000000000001, 'epsilon': 3.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:01,723] Trial 49 finished with value: 6.47 and parameters: {'C': 30.67, 'epsilon': 2.1, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:05,016] Trial 50 finished with value: 6.53 and parameters: {'C': 108.22000000000001, 'epsilon': 5.6000000000000005, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:05,459] Trial 51 finished with value: 6.44 and parameters: {'C': 6.91, 'epsilon': 3.8000000000000003, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:05,637] Trial 52 finished with value: 6.45 and parameters: {'C': 0.54, 'epsilon': 4.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:07,094] Trial 53 finished with value: 6.46 and parameters: {'C': 27.500000000000004, 'epsilon': 2.6, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:10,084] Trial 54 finished with value: 6.49 and parameters: {'C': 61.48, 'epsilon': 4.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:11,146] Trial 55 finished with value: 6.45 and parameters: {'C': 23.290000000000003, 'epsilon': 3.5, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:14,766] Trial 56 finished with value: 6.49 and parameters: {'C': 79.26, 'epsilon': 3.0, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:23,175] Trial 57 finished with value: 6.52 and parameters: {'C': 175.94, 'epsilon': 4.4, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:32,768] Trial 58 finished with value: 6.53 and parameters: {'C': 268.36, 'epsilon': 4.7, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:35,611] Trial 59 finished with value: 6.49 and parameters: {'C': 55.51, 'epsilon': 3.2, 'kernel': 'linear', 'degree': 3}. Best is trial 11 with value: 6.43.
[I 2023-08-24 11:02:35,804] Trial 60 finished with value: 6.42 and parameters: {'C': 0.3, 'epsilon': 2.5, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:36,010] Trial 61 finished with value: 6.43 and parameters: {'C': 0.59, 'epsilon': 2.6, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:37,839] Trial 62 finished with value: 6.48 and parameters: {'C': 40.54, 'epsilon': 2.0, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:38,797] Trial 63 finished with value: 6.46 and parameters: {'C': 17.05, 'epsilon': 2.5, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:38,977] Trial 64 finished with value: 6.46 and parameters: {'C': 0.09, 'epsilon': 2.6, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:42,634] Trial 65 finished with value: 6.51 and parameters: {'C': 68.4, 'epsilon': 1.6, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:45,039] Trial 66 finished with value: 6.47 and parameters: {'C': 42.17, 'epsilon': 2.8, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:45,631] Trial 67 finished with value: 6.54 and parameters: {'C': 21.23, 'epsilon': 6.800000000000001, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:48,146] Trial 68 finished with value: 6.48 and parameters: {'C': 53.089999999999996, 'epsilon': 2.3, 'kernel': 'linear', 'degree': 4}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:02:48,910] Trial 69 finished with value: 6.45 and parameters: {'C': 16.610000000000003, 'epsilon': 3.3000000000000003, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:14,086] Trial 70 finished with value: 6.53 and parameters: {'C': 514.25, 'epsilon': 1.8, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:14,889] Trial 71 finished with value: 6.45 and parameters: {'C': 12.81, 'epsilon': 3.7, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:16,815] Trial 72 finished with value: 6.49 and parameters: {'C': 39.44, 'epsilon': 1.4, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:20,567] Trial 73 finished with value: 6.5 and parameters: {'C': 78.82000000000001, 'epsilon': 3.1, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:20,988] Trial 74 finished with value: 6.44 and parameters: {'C': 5.46, 'epsilon': 2.8, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:23,718] Trial 75 finished with value: 6.49 and parameters: {'C': 60.61, 'epsilon': 4.0, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:27,379] Trial 76 finished with value: 6.5 and parameters: {'C': 94.49000000000001, 'epsilon': 3.4000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:28,914] Trial 77 finished with value: 6.47 and parameters: {'C': 32.17, 'epsilon': 2.4000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:33,357] Trial 78 finished with value: 6.5 and parameters: {'C': 113.4, 'epsilon': 2.9000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:45,982] Trial 79 finished with value: 6.51 and parameters: {'C': 338.51, 'epsilon': 3.5, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:51,800] Trial 80 finished with value: 6.5 and parameters: {'C': 133.75, 'epsilon': 2.7, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:52,075] Trial 81 finished with value: 6.44 and parameters: {'C': 1.42, 'epsilon': 3.2, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:53,306] Trial 82 finished with value: 6.45 and parameters: {'C': 18.42, 'epsilon': 3.8000000000000003, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:55,232] Trial 83 finished with value: 6.47 and parameters: {'C': 34.97, 'epsilon': 3.3000000000000003, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:56,778] Trial 84 finished with value: 6.58 and parameters: {'C': 52.72, 'epsilon': 8.0, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:03:57,619] Trial 85 finished with value: 6.45 and parameters: {'C': 14.65, 'epsilon': 3.6, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:00,624] Trial 86 finished with value: 6.46 and parameters: {'C': 31.450000000000003, 'epsilon': 3.0, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:00,859] Trial 87 finished with value: 6.44 and parameters: {'C': 1.99, 'epsilon': 4.2, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:03,758] Trial 88 finished with value: 6.49 and parameters: {'C': 71.80000000000001, 'epsilon': 3.4000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:05,563] Trial 89 finished with value: 6.48 and parameters: {'C': 44.04, 'epsilon': 3.9000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:06,558] Trial 90 finished with value: 6.45 and parameters: {'C': 15.0, 'epsilon': 2.2, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:07,795] Trial 91 finished with value: 6.46 and parameters: {'C': 27.060000000000002, 'epsilon': 3.7, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:08,614] Trial 92 finished with value: 6.46 and parameters: {'C': 11.58, 'epsilon': 4.1, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:08,842] Trial 93 finished with value: 6.43 and parameters: {'C': 0.78, 'epsilon': 3.1, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:11,551] Trial 94 finished with value: 6.48 and parameters: {'C': 51.54, 'epsilon': 3.1, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:12,825] Trial 95 finished with value: 6.46 and parameters: {'C': 29.51, 'epsilon': 2.6, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:13,047] Trial 96 finished with value: 6.43 and parameters: {'C': 0.68, 'epsilon': 2.9000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:13,279] Trial 97 finished with value: 6.43 and parameters: {'C': 0.98, 'epsilon': 2.9000000000000004, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:14,183] Trial 98 finished with value: 6.45 and parameters: {'C': 13.95, 'epsilon': 2.9000000000000004, 'kernel': 'linear', 'degree': 4}. Best is trial 60 with value: 6.42.
[I 2023-08-24 11:04:17,964] Trial 99 finished with value: 6.5 and parameters: {'C': 86.69000000000001, 'epsilon': 2.5, 'kernel': 'linear', 'degree': 3}. Best is trial 60 with value: 6.42.
Лучшие параметры: {'C': 0.3, 'epsilon': 2.5, 'kernel': 'linear', 'degree': 3}
Лучшее значение метрики: 6.42
| C | epsilon | kernel | degree | score | |
|---|---|---|---|---|---|
| 0 | 0.3 | 2.5 | linear | 3.0 | 6.42 |
<timed exec>:54: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future. <timed exec>:70: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
Wall time: 7min 12s
svr
SVR(C=0.3, epsilon=2.5, kernel='linear')
def calculate_metric(model):
# Вычисление средней абсолютной ошибки с помощью кросс-валидации
scores = cross_val_score(model,
features_train,
target_train,
cv=kf,
scoring='neg_mean_absolute_error', n_jobs=-1)
metric = round(abs(scores.mean()), 2)
return metric
metrics = {}
metrics['LinearRegression'] = calculate_metric(regressor)
metrics['RandomForestRegressor'] = calculate_metric(forest)
metrics['LGBMRegressor'] = calculate_metric(lgbm)
metrics['CatBoostRegressor'] = calculate_metric(catboost)
metrics['SVR'] = calculate_metric(svr)
models = list(metrics.keys())
values = list(metrics.values())
colors = ['blue', 'green', 'red', 'purple', 'yellow']
# Построение графика сравнения метрик
plt.bar(models, values, color=colors)
plt.xlabel('Модель',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Метрика',
fontsize=12,
color='DarkSlateGray')
plt.title('Сравнение метрик',
fontsize=15,
color='DarkSlateGray')
plt.xticks(rotation=90)
for i, v in enumerate(values):
plt.text(i, v, str(v), ha='center', va='bottom')
plt.show()
Все модели показали себя хорошо на кросвалидации. Лучшая - LGBMRegressor (lgbm).
model = DummyRegressor()
model.fit(features_train, target_train)
lgbm_mae_test = round(mae(target_test, model.predict(features_test)), 2)
print('На простой модели регрессии МАЕ = ', lgbm_mae_test)
На простой модели регрессии МАЕ = 8.69
lgbm.fit(features_train, target_train)
lgbm_mae_test = round(mae(target_test, lgbm.predict(features_test)), 2)
print('На тестовой выборке МАЕ = ', lgbm_mae_test)
На тестовой выборке МАЕ = 6.62
Наша модель превзошла по метрике модели, которая всегда предсказывает среднее значение целевой переменной и простую модель регрессии.
Полученная средняя абсолютная ошибка нас устраивает, принимаем эту модель как удовлетворяющую требованиям.
lgbm
LGBMRegressor(colsample_bytree=0.7, learning_rate=0.09, max_depth=2,
min_child_samples=7, n_estimators=276, random_state=140823,
subsample=0.76)
importance = lgbm.feature_importances_
# Сортируем индексы важности модели в порядке убывания
sorted_indices = np.argsort(importance)[::-1]
# Получаем отсортированные названия признаков
sorted_features = features_train.columns[sorted_indices]
# Получаем отсортированные значения важности признаков
sorted_importance = importance[sorted_indices]
# Вычисляем общую важность всех признаков
total_importance = np.sum(sorted_importance)
# Вычисляем процентную важность каждого признака
sorted_importance_percent = (sorted_importance / total_importance) * 100
# Создаем график
plt.figure(figsize=(15, 5))
# Строим столбчатую диаграмму с отсортированными признаками и их важностью
plt.bar(sorted_features, sorted_importance_percent)
plt.xticks(rotation='vertical')
plt.xlabel('Признаки',
fontsize=12,
color='DarkSlateGray')
plt.ylabel('Важность (%)',
fontsize=12,
color='DarkSlateGray')
plt.title('Важность признаков',
fontsize=15,
color='DarkSlateGray')
plt.minorticks_on()
plt.grid(which='minor',
linestyle=':')
plt.grid(True)
plt.show()
print("Таблица значимости признаков:")
df_importance = pd.DataFrame({"Признаки": sorted_features,
"Важность (%)": sorted_importance_percent}).style.background_gradient('coolwarm')
df_importance
Таблица значимости признаков:
| Признаки | Важность (%) | |
|---|---|---|
| 0 | energy_consumption | 12.816300 |
| 1 | temperature_first_measurement | 12.421952 |
| 2 | time_between_measurements | 9.431482 |
| 3 | bulk_14 | 8.839961 |
| 4 | wire_1 | 8.149852 |
| 5 | gas_quantities | 8.149852 |
| 6 | duration_bulk | 6.539599 |
| 7 | bulk_12 | 5.619454 |
| 8 | wire_sum | 5.455143 |
| 9 | bulk_6 | 4.042064 |
| 10 | wire_2 | 3.746303 |
| 11 | bulk_15 | 3.450542 |
| 12 | bulk_3 | 3.220506 |
| 13 | duration_wire | 2.136050 |
| 14 | bulk_1 | 1.840289 |
| 15 | bulk_4 | 1.675978 |
| 16 | bulk_11 | 0.953007 |
| 17 | bulk_10 | 0.525797 |
| 18 | wire_6 | 0.492935 |
| 19 | bulk_5 | 0.262898 |
| 20 | bulk_7_wire_4_bulk_2 | 0.131449 |
| 21 | wire_3 | 0.098587 |
| 22 | wire_7 | 0.000000 |
| 23 | bulk_13 | 0.000000 |
| 24 | bulk_9_wire_8 | 0.000000 |
| 25 | wire_9 | 0.000000 |
На основе анализа важности признаков можно сделать следующие выводы:
temperature_first_measurement - температура в начале процесса обработки.energy_consumption - потребляемая в процессе обработки энергия.time_between_measurements - сумарное время нагрева.
*Эти признаки оказывают наибольшее влияние на предсказание целевой переменной.bulk_14 - сыпучий материал.wire_1 - проволочный материал.gas_quantities - количество газа для продувки.duration_bulk - сумарное время подачи сыпучих материалов.bulk_12 - сыпучий материал.wire_sum - общее количество подаваемых проволочных материалов.bulk_6 - сыпучий материал.wire_2 - проволочный материал.bulk_15 - сыпучий материал.bulk_3 - сыпучий материал.duration_wire - сумарное время подачи проволочных материалов.bulk_1 - сыпучий материал.bulk_4 - сыпучий материал.bulk_11 - сыпучий материал.bulk_10 - сыпучий материал.wire_6 - проволочный материал.bulk_5 - сыпучий материал.bulk_7 - сыпучий материал.wire_4 - проволочный материал.bulk_2 - сыпучий материал.wire_3 - проволочный материал.bulk_9, bulk_13 - сыпучие материалы.wire_7, wire_8, wire_9 - проволочные материалы.Таким образом, наиболее важные признаки для предсказания целевой переменной включают температуру первого измерения, потребление энергии, время обработки, данные о некоторых веществах и проволоке.
Перед нами стояла задача:
Нам были предоставлены:
csv, полученных из разных источников:data_arc_new.csv — данные об электродах;data_bulk_new.csv — данные о подаче сыпучих материалов (объём);data_bulk_time_new.csv — данные о подаче сыпучих материалов (время);data_gas_new.csv — данные о продувке сплава газом;data_temp_new.csv — результаты измерения температуры;data_wire_new.csv — данные о проволочных материалах (объём);data_wire_time_new.csv — данные о проволочных материалах (время).Во всех файлах столбец key содержит номер партии.
Был составлен приблизительный план работы:
Для выполнения поставленной задачи использовались следующие библиотеки и модули:
По требованиям заказчика.
RANDOM_STATE = 140823В ходе ознакомления с данными было установлено:
df_arc
Данные содержат информацию о начале и конце нагрева дугой, а также активной и реактивной мощности для каждого ключевого значения. Всего в данных содержится 14876 записей.
Типы данных в столбцах : df_bulk
Данные содержат информацию о добавлении сыпучих материалов в процессе обработки. Всего в данных содержится 3129 записей. В столбцах Bulk 1...Bulk 15 указаны объемы добавленных материалов для каждого ключевого значения.В данных имеются пропущенные значения для большинства столбцов, что означает, что не все материалы были добавлены для каждого ключевого значения.
df_bulk_time
В таблице представлены данные о времени подачи сыпучих материалов. Всего в таблице 3129 строк и 16 столбцов, как и в df_bulk. В столбце key указан ключ для каждой записи. В столбцах Bulk 1...Bulk 15 указаны временные метки для выполнения операций Bulk. Некоторые столбцы содержат пропущенные значения. Для key выполнены несколько операций Bulk, для разныхkey разные и в рвзных количествах. (Bulk 14 выполнялся 2806 раз, а Bulk 8 - 1)
Данные в столбцах Bulk 1...Bulk 15 представлены в формате object.
df_gas
Данные предоставляют информацию о продувке сплава газом. Набор данных содержит 3239 строк и 2 столбца: key и Газ 1. Столбец key представляет собой номер партии, а столбец Газ 1 содержит числовые значения, отражающие объем газа для продувки. Оба столбца имеют правильные типы данных.
df_temp
Данные из файла содержат информацию о замерах температуры в разных моментах времени. В наборе данных представлено 18092 строк и 3 столбца:
key - номер партии,Время замера - дата и время проведения замера,Температура - числовое значение температуры.Столбец key представляет собой уникальный идентификатор, который, используется для связи с другими таблицами. Столбец Время замера содержит информацию о точной дате и времени проведения замера температуры. Столбец Температура содержит числовые значения, отражающие измеренную температуру.
Столбец Температура содержит недостающие значения (NaN) в некоторых строках (14665 непустых значений из 18092).
object и была преобразована в тип datetime для дальнейшего анализа.В ходе предобработки данных было установлено:
В df_arc записи с 2019-05-03 11:02:14 по 2019-09-06 17:26:15, количество партий - 3214 (проводилась повторная обработка и по этому количество записей - 14876).
Активная мощность от 0.223120 до 1.463773. Реактивная мощность от -715.479924 до 1.270284. Отрицательное значение одно и выглядело не логично, имело более высокий порядок(e+02, остальные значения в интервале e-01 - e+00).
Между активной и реактивной мощностью была установлена высокая положительная корреляция(0.97). Что позволило исправить некорректное значение реактивной мощности, заменив его на произведение "Активной мощности" на среднюю пропорцию "Реактивной мощности" к "Активной мощности".
Анализируя данные по количеству обработок и партий на сталелитейном заводе, были сделаны следующие наблюдения:
Таким образом, основная часть партий на сталелитейном заводе проходит 4 или 5 обработок, а среднее количество обработок для одной партии : 4.63.
В промежуток с 13 по 18 июля отсутствуют записи в данных.(скорее всего или авария или ремонт оборудования - что-то от чего производство не работало)
В среднем процесс нагрева электродами длится около 3,5 минут, минимальная продолжительность обработки составляет 11 секунд, максимальная - чуть больше 15 минут.
Мы вычислили потребляемую энергию, умножив продолжительность дуги в секундах (разность времени окончания дуги и времени ее начала) на полную мощность ($S$).$$S=\sqrt{P^2+Q^2}$$ Где:
Bulk 8 только одна партия была зарегистрирована, для Bulk 14 - 2806.Средние значения объёма подачи сыпучих материалов варьируются от 39.24 до 305.6.
Стандартное отклонение объёма подачи сыпучих материалов находится в диапазоне от 18.28 до 191.02, что говорит о разбросе данных внутри каждой партии.
Минимальные значения объёма подачи сыпучих материалов составляют от 6 до 49, а максимальные значения — от 185 до 772, что указывает на значительную разницу между минимальными и максимальными значениями внутри каждой партии.
Медианные значения объёма подачи сыпучих материалов колеблются от 31 до 298.
Значения в первом квартиле (25%) варьируются от 27 до 406, а в третьем квартиле (75%) — от 46 до 205, что указывает на наличие большого разброса данных и значительные отличия между нижними и верхними квартилями.
В целом, данные о подаче сыпучих материалов (объем) показывают значительную вариабельность внутри каждой партии и различия между партиями, что связано с широким ассортиментом выпускаемой продукции (различными условиями производства и требованиями процесса).
В столбцах Bulk_5 и Bulk_12 таблицы data_bulk_new.csv выявлены единичные случаи, когда значения сильно выбиваются из общей картины (аномалии).
Признаки Bulk_1 - Bulk_15 таблицы data_bulk_time_new.csv заменили на длительность процесса подачи сыпучих материалов (так более информативно)
Максимальная длительность обработки одной партии стали с применением сыпучих материалов составляет 13683 с (около 4 часов).
Таблица data_gas_new.csv содержит 3239 наблюдений (количество партий), а данные об электродах - 3214. Среднее значение количества газа составляет примерно 11. Стандартное отклонение равно приблизительно 6.22, что указывает на относительно высокую вариацию значений. Минимальное значение количества газа составляет 0.008399, а максимальное значение равно 77.995040. Первый квартиль (25%) равен 7.043089, медиана (50%) составляет 9.836267, а третий квартиль (75%) равен 13.769915.
Исходя из этой информации сделан вывод, что измерения количества газа имеют широкий диапазон значений с примерным средним значением 11.002062. Наличие относительно высокого стандартного отклонения указывает на значительную изменчивость значений количества газа. Партий металла обработанных газом больше, чем расплавленных электродами. (особенность технологии)
Таблица data_temp_new.csv содержит количество партий 3216 (количество уникальных записей в столбце key) не соотносится ни с одной из таблиц, но номера партий доходят до 3241 как в данных о продувке сплава газом. Количество дубликатов в столбце с номером партии - 14876 из 18092 строк.
Обнаружены пять значений температур, которые выбиваются из общей массы: 1191, 1204, 1208, 1218, 1227. Такие температуры более свойственные для закалки и не соответствуют технологическому процессу.
Большинство партий имели 6 или 5 замеров температур (соответственно 27.74% и 23.6% всех партий).
Партии с количеством замеров 4 и 3 составляют значительную долю (соответственно 16.17% и 5.41%).
Партии с количеством замеров 7 и 8 составляют значительную долю (соответственно 15.24% и 6.37%).
Партии, в которых делали более 8 замеров, составляют меньшую долю (всего около 4% всех партий). Такое количество замеров требуется только в особых случаях.
Партии с количеством замеров 1 или выше 11 составляют незначительную долю (всего 0.31% всех партий).
Таким образом, стандартное количество замеров температур в партии составляет 5-6 и оно является наиболее распространенным. Все значения за пределами этого диапазона можно считать исключениями или особыми случаями. Удалили строки, где количество замеров меньше двух.
Wire 1, Wire 2 и Wire 3 имеют большое количество данных (3055, 1079 и 63 наблюдения соответственно), в то время как количество данных для Wire 4- Wire 9 намного меньше. Различия в средних значениях и стандартных отклонениях между проволоками указывают на разные характеристики и свойства этих материалов.Wire 6 использовали только один раз.
Признаки Wire 1 - Wire 15 таблицы 'data_wire_time_new.csv'` заменили на длительность процесса подачи проволочных материалов (так более информативно).
Максимальная длительность обработки одной партии стали с применением проволочных материалов составляет 5937 с (около 1,6 часа).
Заменили пропуски в предоставленных таблицах на нули, так как они связаны с особенностями технологии.
Названия столбцов в таблицах перевили на английский и привели к "змеиному" шрифту.
В ходе предобработки данных были добавлены новые признаки:
Данные в таблицах были агрегированны по ключу.
В ходе проведения анализа данных были выявлены моменты, которые уточнялись и согласовывались с представителем заказчика (Был составлен список вопросов)
Список Вопросов:
data_arc_new.csv — данные об электродах) за промежуток с 13 по 18 июля.data_bulk_new.csv — данные о подаче сыпучих материалов (объём):Bulk 1...Bulk 15 - это разные материалы?Bulk 8 действительно использовали один раз за период с 2019-05-03 по 2019-09-06?Bulk 5 имеет одно значение 603 при условии, что значения, привышающие максимум (лежат за границей выброса) распределения: 234, 242, 256, 293. Это нормально?Bulk 12 имеет одно значение 1849 при условии, что значения, привышающие максимум (лежат за границей выброса) распределения: 496 - 853. Это нормально?После уточнения информациии дообработали таблички. (удалили редкие (единичные) случаи и аномальные значения.
Составили окончательную таблицу для обучения моделей, объеденив все таблицы и включая только те партии, для которых доступна информация по каждому этапу подготовки сплава. Причина такого выбора состоит в том, что каждая партия проходит все этапы подготовки.
Так как целевым признаком является последняя измеренная температура, проверили, чтоб после последнего измерения температуры больше не каких манипуляций со сплавом не производилось. После этого удалили столбцы с типом данных datetime.
Тот факт, что между активной и реактивной мощностью высокая положительная корреляция был нами отмечен в самом начале обработки даных, была вычеслена полная мощность, которая объединяет оба этих параметра. Мы вычисляли продолжительность дуги и умножая на полную мощность определяли потребляемую энергию. Таким образом потребляемая энергия объединяет все четыре параметра. Оставим только потребляемую энергию, чтоб не создавать лишних связей между параметрами и не путать модель.
Сделали визуализацию корреляции сильно коррелирующих переменных. Из неё видно, что некоторые переменные имеют высокую корреляцию между собой. Например, energy_consumption имеет высокую корреляцию с count_arc (0.71) и bulk_sum (0.5), bulk_sum в свою очередь сильно коррелирует с bulk_12 (0.87). Удалили count_arc и bulk_sum.
Видна тесная взаимосвязь bulk_7-wire_4 и bulk_9-wire_8.
Такая высокая корреляция может указывать на наличие мультиколлинеарности между этими переменными. Мультиколлинеарность может внести нестабильность и неоднозначность в регрессионные модели, и затруднить интерпретацию важности каждой переменной.
Для избавления от мультиколлинеарности объеденили попарно bulk_7-wire_4 и bulk_9-wire_8.
После чего появилась корреляция между bulk_7_wire_4 и bulk_2. Объеденили их, создав bulk_7_wire_4_bulk_2.
Прямой зависимости целевого признака от какого-либо параметра нет. Без построения модели нельзя сказать от чего зависит температура на заключительном этапе.
Для обучения моделей подготовили выборки (обучающую и тестовую c разбивкой $\frac{2}{3}$ и $\frac{1}{3}$ соответственно).
Выделили целевой признак и функциональные признаки.
Так как признаки имеют разную размерность обучаили и провели стандартизацию на тренировочных данных, а затем провели ту же стандартизацию на тестовых данных.
Вычислили среднюю абсолютную ошибку (MAE=7.88) для модели, которая всегда предсказывает среднее значение целевой переменной. Получили таким образом базовую метрику для сравнения с другими моделями. Это позволило оценить, насколько хорошо другие модели справляются с прогнозированием по сравнению с этой "наивной" моделью.
Cоздали объект KFold для разделения датасета для кросс-валидации.
Для подбора гиперпараметров моделей использовали библиотеку для автоматической оптимизации параметров модели Optuna. Производили подбор поэтапно уменьшая интервалы параметров для подбора, используя для этих целей визуализацию (таблицы с параметрами моделей, показавших лучшую метрику при подборе параметров, графики важности гиперпараметров, графики истории оптимизации и графики контура для разных параметров)
Все модели показали метрику лучше, чем "наивная" модель. Более того они удовлетворяют требованиям заказчика $МАЕ\ngtr 6.8$
С целью выбора наилучшей модели провели визуализацию значений полученных метрик на кросс-валидации. Лучшая - LGBMRegressor с параметрами:
colsample_bytree : 0.7,learning_rate : 0.09,max_depth : 2, min_child_samples : 7,n_estimators : 276,random_state : 140823, subsample : 0.76Проведено тестирование лучшей модели (на отложеной выборке). Получена $МАЕ = 6.62$, что удовлетворяют требованиям заказчика $МАЕ\ngtr 6.8$. Дополнительно было проведено сравнение с простой моделью регрессии DummyRegressor, выдавшей $МАЕ = 8.69$.
Результаты тестирования показали, что нам удалось создать достойную модель, удовлетворяющую требованиям заказчика.
Было проведено исследование степени влияния признаков на целевой показатель.
На основе анализа важности признаков можно сделать следующие выводы:
temperature_first_measurement - температура в начале процесса обработки.energy_consumption - потребляемая в процессе обработки энергия.time_between_measurements - сумарное время нагрева.
*Эти признаки оказывают наибольшее влияние на предсказание целевой переменной.Существенно влияют на целевую переменную:
bulk_14 - сыпучий материал.wire_1 - проволочный материал.gas_quantities - количество газа для продувки.duration_bulk - сумарное время подачи сыпучих материалов.bulk_12 - сыпучий материал.wire_sum - общее количество подаваемых проволочных материалов.bulk_6 - сыпучий материал.wire_2 - проволочный материал.bulk_15 - сыпучий материал.bulk_3 - сыпучий материал.duration_wire - сумарное время подачи проволочных материалов.bulk_1 - сыпучий материал.bulk_4 - сыпучий материал.bulk_11 - сыпучий материал.Мало Влияющие:
bulk_10 - сыпучий материал.wire_6 - проволочный материал.bulk_5 - сыпучий материал.bulk_7 - сыпучий материал.wire_4 - проволочный материал.bulk_2 - сыпучий материал.wire_3 - проволочный материал.Не оказывают влияния:
bulk_9, bulk_13 - сыпучие материалы.wire_7, wire_8, wire_9 - проволочные материалы.Таким образом, наиболее важные признаки для предсказания целевой переменной включают температуру первого измерения, потребление энергии, время обработки, данные о некоторых веществах и проволоке.
План работы выполнен полностью, поставленная задача решена.